Skip to contents

Filters a genotype matrix by applying constraints on the maximum and minimum genotype frequency per marker, as well as heterozygous frequency. The filtering is based on allele frequency calculations using the freq() function internally.

Usage

filter_geno_by_freq(
  geno_matrix,
  max_geno_freq = NULL,
  het_freq_range = NULL,
  min_geno_freq = NULL,
  input_format = "numeric"
)

Arguments

geno_matrix

A numeric genotype matrix or data frame where:

  • Rows represent genetic markers.

  • Columns represent individuals.

  • Values are either 0, 1, 2 (numeric format) or "A", "H", "B" (genotype format).

max_geno_freq

Numeric. If provided, removes markers where the most frequent genotype exceeds this threshold.

het_freq_range

Numeric vector of length 2. If provided, retains markers where heterozygosity frequency is within the specified range (c(min, max)).

min_geno_freq

Numeric. If provided, removes markers where the least frequent genotype falls below this threshold.

input_format

Character. Specifies whether the genotype matrix is in "numeric" (0, 1, 2) or "genotype" ("A", "H", "B") format. Default is "numeric".

Value

A filtered genotype matrix with only markers that meet the specified frequency criteria.

Details

  • Computes genotype frequencies using the freq() function.

  • Retains only markers that meet the specified frequency constraints.

  • If all filtering parameters (max_geno_freq, het_freq_range, min_geno_freq) are NULL, the function returns the original matrix.

Examples

# Example genotype matrix
geno_data <- matrix(sample(0:2, 30, replace = TRUE),
                    nrow = 10, ncol = 3,
                    dimnames = list(paste0("Marker", 1:10), paste0("Ind", 1:3)))

# Filter markers with max genotype frequency < 0.95, heterozygosity between 0.1 and 0.8,
# and minimum genotype frequency >= 0.05
filtered_data <- filter_geno_by_freq(geno_data, max_geno_freq = 0.95,
                                     het_freq_range = c(0.1, 0.80), min_geno_freq = 0.05)