
Convert VCF Genotype Format to Dosage (Advanced, Polyploid-Compatible)
Source:R/convert_to_dosage_advanced.R
convert_to_dosage_advanced.Rd
Converts a matrix of genotype calls from VCF-style format (e.g., "0/1", "1/1", "0/0/1/1") into numeric dosage values. Supports variable ploidy, multi-allelic variants (e.g., "2", "3"), and includes optional outputs for ploidy level, usable allele counts, and normalized dosage.
Usage
convert_to_dosage_advanced(
GT,
alt_alleles = c("1"),
strict_missing = TRUE,
normalize = FALSE
)
Arguments
- GT
A character matrix with genotypes in VCF format (e.g., "0/1/1/1", "1/1/1/1").
- alt_alleles
A character vector indicating which allele values should be counted as alternate (e.g., c("1") or c("1", "2")).
- strict_missing
Logical. If TRUE (default), any missing allele (i.e., ".") causes the entire dosage value to be set as NA.
- normalize
Logical. If TRUE, dosage values are normalized by ploidy level (i.e., scaled to 0-1). Default is FALSE.
Value
A list with the following elements:
dosage: Matrix of alternate allele dosage values.
ploidy: Matrix of the number of alleles per genotype (excluding NAs).
usable_alleles: Matrix of the number of non-missing alleles used in dosage calculation.
Note
I found ways of generalizing the function to work with more complex situations from previous versions of the function. This function has been tested but not rigorously. Contact author with any issues.
Examples
if (FALSE) { # \dontrun{
vcf <- matrix(c("0/0/0/0", "1/1/1/1", "0/1/1/1",
"1/1/1/1", NA, "0/1/./1"),
nrow = 2, byrow = TRUE,
dimnames = list(c("Marker1", "Marker2"),
c("Ind1", "Ind2", "Ind3")))
convert_to_dosage_advanced(vcf, alt_alleles = c("1"))
convert_to_dosage_advanced(vcf, alt_alleles = c("1"), strict_missing = FALSE, normalize = TRUE)
set.seed(123) # for reproducibility
# Function to generate one genotype (VCF-style)
generate_genotype <- function(ploidy = 4, missing_rate = 0.05) {
alleles <- sample(c(0, 1, 2, "."), size = ploidy,
replace = TRUE,
prob = c(0.40, 0.40, 0.10, missing_rate))
paste(alleles, collapse = "/")
}
# Parameters
n_markers <- 100
n_individuals <- 10
ploidy <- 4 # tetraploid
genotype_matrix <- matrix(
data = replicate(n_markers * n_individuals, generate_genotype(ploidy = ploidy)),
nrow = n_markers,
ncol = n_individuals,
dimnames = list(
paste0("Marker", seq_len(n_markers)),
paste0("Ind", seq_len(n_individuals))
)
)
# Preview
head(genotype_matrix)
result <- convert_to_dosage_advanced(genotype_matrix,
alt_alleles = c("1", "2"),
strict_missing = TRUE,
normalize = FALSE)
# View dosage matrix
head(result$dosage)
# View ploidy matrix
head(result$ploidy)
# View number of usable alleles
head(result$usable_alleles)
} # }