Filters a genotype matrix by applying constraints on the maximum
and minimum genotype frequency per marker, as well as heterozygous frequency.
The filtering is based on allele frequency calculations using the freq()
function internally.
Usage
filter_geno_by_freq(
geno_matrix,
max_geno_freq = NULL,
het_freq_range = NULL,
min_geno_freq = NULL,
input_format = "numeric"
)
Arguments
- geno_matrix
A numeric genotype matrix or data frame where:
Rows represent genetic markers.
Columns represent individuals.
Values are either
0, 1, 2
(numeric format) or"A", "H", "B"
(genotype format).
- max_geno_freq
Numeric. If provided, removes markers where the most frequent genotype exceeds this threshold.
- het_freq_range
Numeric vector of length 2. If provided, retains markers where heterozygosity frequency is within the specified range (
c(min, max)
).- min_geno_freq
Numeric. If provided, removes markers where the least frequent genotype falls below this threshold.
- input_format
Character. Specifies whether the genotype matrix is in
"numeric"
(0, 1, 2
) or"genotype"
("A", "H", "B"
) format. Default is"numeric"
.
Details
Computes genotype frequencies using the
freq()
function.Retains only markers that meet the specified frequency constraints.
If all filtering parameters (
max_geno_freq
,het_freq_range
,min_geno_freq
) areNULL
, the function returns the original matrix.
Examples
# Example genotype matrix
geno_data <- matrix(sample(0:2, 30, replace = TRUE),
nrow = 10, ncol = 3,
dimnames = list(paste0("Marker", 1:10), paste0("Ind", 1:3)))
# Filter markers with max genotype frequency < 0.95, heterozygosity between 0.1 and 0.8,
# and minimum genotype frequency >= 0.05
filtered_data <- filter_geno_by_freq(geno_data, max_geno_freq = 0.95,
het_freq_range = c(0.1, 0.80), min_geno_freq = 0.05)