Skip to contents

Filters a polyploid genotype matrix based on dosage frequencies across markers. Allows users to define what dosages count as heterozygous and to apply constraints on maximum/ minimum genotype frequencies and heterozygous frequency ranges.

Usage

filter_geno_by_freq_poly(
  geno_matrix,
  max_geno_freq = NULL,
  het_freq_range = NULL,
  min_geno_freq = NULL,
  het_dosages = c(1, 2, 3)
)

Arguments

geno_matrix

A numeric genotype matrix or data frame where:

  • Rows are markers

  • Columns are individuals

  • Values are dosage values (e.g., 0 to 4 for tetraploids)

max_geno_freq

Numeric. If provided, removes markers where the most frequent dosage exceeds this value.

het_freq_range

Numeric vector of length 2. Keeps markers where heterozygosity frequency is within range.

min_geno_freq

Numeric. If provided, removes markers where the least frequent dosage falls below this value.

het_dosages

Integer vector of dosage values to treat as "heterozygous" (e.g., c(1, 2, 3)).

Value

A filtered genotype matrix containing only markers that meet all criteria.

Details

  • Designed for polyploid dosage matrices (not raw genotype strings).

  • Computes dosage frequencies per marker (row-wise).

  • het_dosages is used to define heterozygosity in a generalizable way.

In diploids, it's simple:

0 = homozygous reference (e.g., "AA") 1 = heterozygous (e.g., "AB") 2 = homozygous alternate (e.g., "BB")

So "1" is always the heterozygous state.

But in polyploids (e.g., tetraploids, hexaploids), we can have more intermediate states. For example, in a tetraploid:

0 = "AAAA" → homozygous ref 1 = "AAAB" 2 = "AABB" 3 = "ABBB" 4 = "BBBB" → homozygous alt

Note

This function has been tested but not rigorously please contact author with any issues.

Examples

if (FALSE) { # \dontrun{
set.seed(123)
test_geno <- matrix(sample(0:4, 1000, replace = TRUE), nrow = 100, ncol = 10)
rownames(test_geno) <- paste0("Marker", 1:100)
colnames(test_geno) <- paste0("Ind", 1:10)

# Filter markers with max dosage freq < 0.7 and heterozygote freq between 0.2 and 0.8
filtered <- filter_geno_by_freq_poly(
  test_geno,
  max_geno_freq = 0.7,
  het_freq_range = c(0.2, 0.8),
  min_geno_freq = 0.05,
  het_dosages = c(1, 2, 3)
)

dim(filtered)  # Number of markers retained
} # }