
Flexible Genotype Format Converter (Supports Polyploid Dosage)
Source:R/formater_flex.R
formater_flex.Rd
Converts genotype data between character string formats (e.g., "AABB"
, "ABBB"
) and numeric dosage values
(e.g., 0, 1, 2, 3) based on user-defined reference and alternative alleles. This function is designed to handle
diploid and polyploid genotypes by interpreting the number of occurrences of the alternative allele.
This is especially useful for genotype encodings that represent dosage via string repetition (e.g., "AAAB"
→ 1 alt allele),
or to reconstruct genotype strings from numeric dosage values (e.g., 0 → "AAAA"
).
Arguments
- geno
A data frame containing genotype values. Values must be either character strings representing genotypes (e.g.,
"AAAA"
,"AAAB"
,"BBBB"
), or numeric dosage values (e.g., 0 toploidy
).- to_numeric
Logical. If
TRUE
, converts character genotypes to numeric dosage by counting the number ofalt_allele
occurrences. IfFALSE
, reconstructs genotype strings from numeric dosage values using the specifiedref_allele
andalt_allele
.- ref_allele
Character. The reference allele. This is used to define the "zero dosage" baseline and for reconstructing character genotypes from numeric dosage (e.g., "A" in "AAAB").
- alt_allele
Character. The alternative allele. Dosage is computed as the count of this allele in the genotype string (e.g., "B" in "AAAB" → dosage = 1).
- ploidy
Integer. Ploidy level of the organism. Only used when
to_numeric = FALSE
to determine how many reference and alternative alleles to paste together when reconstructing genotype strings.
Value
A data frame of the same structure as geno
, with genotypes converted according to the to_numeric
setting.
Details
When
to_numeric = TRUE
, all values ingeno
must be character strings, and the function will count the number of occurrences of thealt_allele
usingstringr::str_count()
. Missing values (NA
) are preserved.When
to_numeric = FALSE
, all values ingeno
must be numeric dosages between 0 andploidy
, inclusive. The function will reconstruct character genotype strings by repeating theref_allele
andalt_allele
the appropriate number of times. For example, withploidy = 4
, a dosage of 2 becomes"AABB"
.Values outside the expected range (e.g., dosage > ploidy or unknown characters) are converted to
NA
.
Examples
# Example 1: Character to numeric (tetraploid)
if (FALSE) { # \dontrun{
geno <- data.frame(
Marker1 = c("AAAA", "AAAB", "AABB", "ABBB", "BBBB"),
Marker2 = c("AAAB", "AABB", "ABBB", "BBBB", NA)
)
formater_flex(geno, to_numeric = TRUE, ref_allele = "A", alt_allele = "B")
} # }
# Example 2: Numeric to character (tetraploid)
if (FALSE) { # \dontrun{
dosage <- data.frame(
Marker1 = c(0, 1, 2, 3, 4),
Marker2 = c(1, 2, 3, 4, NA)
)
formater_flex(dosage, to_numeric = FALSE, ref_allele = "A", alt_allele = "B", ploidy = 4)
} # }