
Flexible Genotype Format Converter (Supports Polyploid Dosage)
Source:R/formater_flex.R
formater_flex.RdConverts genotype data between character string formats (e.g., "AABB", "ABBB") and numeric dosage values
(e.g., 0, 1, 2, 3) based on user-defined reference and alternative alleles. This function is designed to handle
diploid and polyploid genotypes by interpreting the number of occurrences of the alternative allele.
This is especially useful for genotype encodings that represent dosage via string repetition (e.g., "AAAB" → 1 alt allele),
or to reconstruct genotype strings from numeric dosage values (e.g., 0 → "AAAA").
Arguments
- geno
A data frame containing genotype values. Values must be either character strings representing genotypes (e.g.,
"AAAA","AAAB","BBBB"), or numeric dosage values (e.g., 0 toploidy).- to_numeric
Logical. If
TRUE, converts character genotypes to numeric dosage by counting the number ofalt_alleleoccurrences. IfFALSE, reconstructs genotype strings from numeric dosage values using the specifiedref_alleleandalt_allele.- ref_allele
Character. The reference allele. This is used to define the "zero dosage" baseline and for reconstructing character genotypes from numeric dosage (e.g., "A" in "AAAB").
- alt_allele
Character. The alternative allele. Dosage is computed as the count of this allele in the genotype string (e.g., "B" in "AAAB" → dosage = 1).
- ploidy
Integer. Ploidy level of the organism. Only used when
to_numeric = FALSEto determine how many reference and alternative alleles to paste together when reconstructing genotype strings.
Value
A data frame of the same structure as geno, with genotypes converted according to the to_numeric setting.
Details
When
to_numeric = TRUE, all values ingenomust be character strings, and the function will count the number of occurrences of thealt_alleleusingstringr::str_count(). Missing values (NA) are preserved.When
to_numeric = FALSE, all values ingenomust be numeric dosages between 0 andploidy, inclusive. The function will reconstruct character genotype strings by repeating theref_alleleandalt_allelethe appropriate number of times. For example, withploidy = 4, a dosage of 2 becomes"AABB".Values outside the expected range (e.g., dosage > ploidy or unknown characters) are converted to
NA.
Examples
# Example 1: Character to numeric (tetraploid)
if (FALSE) { # \dontrun{
geno <- data.frame(
Marker1 = c("AAAA", "AAAB", "AABB", "ABBB", "BBBB"),
Marker2 = c("AAAB", "AABB", "ABBB", "BBBB", NA)
)
formater_flex(geno, to_numeric = TRUE, ref_allele = "A", alt_allele = "B")
} # }
# Example 2: Numeric to character (tetraploid)
if (FALSE) { # \dontrun{
dosage <- data.frame(
Marker1 = c(0, 1, 2, 3, 4),
Marker2 = c(1, 2, 3, 4, NA)
)
formater_flex(dosage, to_numeric = FALSE, ref_allele = "A", alt_allele = "B", ploidy = 4)
} # }