
Convert VCF Genotype Format to Dosage (Supports Polyploids and Missing Alleles)
Source:R/convert_to_dosage_flex.R
convert_to_dosage_flex.Rd
Converts a VCF genotype matrix (in the GT format e.g., "0/0", or for poplyplids "0/0/1/1") into numeric dosage values representing the count of alternate alleles. Works with any ploidy (but tested only from haploid to hexaploid) and allows control over how missing alleles (".") are handled.
Value
A numeric matrix with the same dimensions and names as GT
, where each value is the
dosage of the alternate allele (assumed to be "1").
Details
This function is flexible for any ploidy level — it simply counts how many
"1"
alleles exist in each genotype.Alleles are split using either
/
or|
, so phased or unphased VCF data are supported.Missing alleles (
"."
) are handled based on thestrict_missing
argument:TRUE
: If any allele is missing in a genotype, the entire dosage is returned asNA
.FALSE
: Missing alleles are ignored and dosage is calculated from known alleles.
Examples
poly_vcf <- matrix(c("0/0/0/0", "1/1/1/1", "0/1/1/1",
"1/1/1/1", NA, "0/1/./1"),
nrow = 2, byrow = TRUE,
dimnames = list(c("Marker1", "Marker2"),
c("Ind1", "Ind2", "Ind3")))
# Strict handling: missing allele causes full NA
convert_to_dosage_flex(poly_vcf, strict_missing = TRUE) # Set whole cell to NA if "." present
#> Ind1 Ind2 Ind3
#> Marker1 0 4 3
#> Marker2 4 NA NA
# Permissive handling: ignore missing and sum known alleles
convert_to_dosage_flex(poly_vcf, strict_missing = FALSE) # Ignore "." and sum what’s there
#> Ind1 Ind2 Ind3
#> Marker1 0 4 3
#> Marker2 4 NA 2