
Filter Genotype Matrix by Dosage Frequency (Polyploid-Compatible)
Source:R/filter_geno_by_freq_poly.R
filter_geno_by_freq_poly.Rd
Filters a polyploid genotype matrix based on dosage frequencies across markers. Allows users to define what dosages count as heterozygous and to apply constraints on maximum/ minimum genotype frequencies and heterozygous frequency ranges.
Usage
filter_geno_by_freq_poly(
geno_matrix,
max_geno_freq = NULL,
het_freq_range = NULL,
min_geno_freq = NULL,
het_dosages = c(1, 2, 3)
)
Arguments
- geno_matrix
A numeric genotype matrix or data frame where:
Rows are markers
Columns are individuals
Values are dosage values (e.g., 0 to 4 for tetraploids)
- max_geno_freq
Numeric. If provided, removes markers where the most frequent dosage exceeds this value.
- het_freq_range
Numeric vector of length 2. Keeps markers where heterozygosity frequency is within range.
- min_geno_freq
Numeric. If provided, removes markers where the least frequent dosage falls below this value.
- het_dosages
Integer vector of dosage values to treat as "heterozygous" (e.g., c(1, 2, 3)).
Details
Designed for polyploid dosage matrices (not raw genotype strings).
Computes dosage frequencies per marker (row-wise).
het_dosages
is used to define heterozygosity in a generalizable way.
In diploids, it's simple:
0 = homozygous reference (e.g., "AA") 1 = heterozygous (e.g., "AB") 2 = homozygous alternate (e.g., "BB")
So "1" is always the heterozygous state.
But in polyploids (e.g., tetraploids, hexaploids), we can have more intermediate states. For example, in a tetraploid:
0 = "AAAA" → homozygous ref 1 = "AAAB" 2 = "AABB" 3 = "ABBB" 4 = "BBBB" → homozygous alt
Examples
if (FALSE) { # \dontrun{
set.seed(123)
test_geno <- matrix(sample(0:4, 1000, replace = TRUE), nrow = 100, ncol = 10)
rownames(test_geno) <- paste0("Marker", 1:100)
colnames(test_geno) <- paste0("Ind", 1:10)
# Filter markers with max dosage freq < 0.7 and heterozygote freq between 0.2 and 0.8
filtered <- filter_geno_by_freq_poly(
test_geno,
max_geno_freq = 0.7,
het_freq_range = c(0.2, 0.8),
min_geno_freq = 0.05,
het_dosages = c(1, 2, 3)
)
dim(filtered) # Number of markers retained
} # }