Skip to contents

recode_markers is the heart of the package. This powerful function phases genotype marker data based on two parental references (parent1 and parent2).

It phases markers according to parental allele inheritance. Optionally, it can code phased markers into numeric (0, 1, 2) or character ("A", "B", "H") formats. Numeric coding is recommended for downstream analyses.

Usage

recode_markers(
  geno,
  parent1,
  parent2,
  numeric_output = TRUE,
  handle_het_markers = FALSE,
  het_marker_types = NULL
)

Arguments

geno

A genotype matrix or data frame where markers are rows and individuals are columns.

parent1

Character. The name of the column representing the first parent.

parent2

Character. The name of the column representing the second parent.

numeric_output

Logical. If TRUE, converts phased markers to numeric dosage values (A = 0, H = 1, B = 2). Default is TRUE.

handle_het_markers

Logical. If TRUE, allows heterozygous parent markers to be included. Default is FALSE.

het_marker_types

Character vector. Specifies which heterozygous markers to keep when handle_het_markers = TRUE. Options include "AxH", "HxB", "HxA", "BxH". Default is NULL, meaning all homozygous markers are kept. See details.

Value

A data frame containing phased genotype markers where:

  • "0" represents alleles inherited from parent1.

  • "2" represents alleles inherited from parent2.

  • "1" represents heterozygous alleles.

  • If numeric_output = FALSE, 0, 2, 1, are replaced by "A", "B", "H" respectively.

  • numeric_output = TRUE is recommended.

Details

  • Drops markers where either parent has NA.

  • Removes non-polymorphic markers (markers where both parents have the same genotype).

  • If handle_het_markers = FALSE, retains only homozygous marker where dosages are as follows: (P1 = 0 & P2 = 2 or P1 = 2 & P2 = 0)

  • If handle_het_markers = TRUE, allows heterozygous markers to be kept.

  • Ensures that parent1 is always 0 and parent2 is always 2 for standardization.

  • Returns a numeric matrix if numeric_output = TRUE, otherwise returns phased "A", "B", "H" values.

  • More details on the heterozygous F2 marker types ("AxH", "HxB", "HxA", "BxH") are in Braun et al. (2017).

Examples

# Example genotype data
geno_data <- data.frame(
  Marker1 = c(0, 1, 2, 0, 2),
  Marker2 = c(2, 0, 2, 1, 0),
  Parent1 = c(0, 2, 2, 0, 2),
  Parent2 = c(2, 0, 0, 2, 0)
)

# recode_markers genotype markers (default: numeric output)
phased_geno <- recode_markers(geno_data, "Parent1", "Parent2")
print(phased_geno)
#>   Marker1 Marker2 Parent1 Parent2
#> 1       0       2       0       2
#> 2       1       2       0       2
#> 3       0       0       0       2
#> 4       0       1       0       2
#> 5       0       2       0       2

# Recode genotype markers with heterozygous marker handling
phased_geno_het <- recode_markers(geno_data, "Parent1", "Parent2",
                          handle_het_markers = TRUE,
                          het_marker_types = c("AxH", "HxB"))
print(phased_geno_het)
#>   Marker1 Marker2 Parent1 Parent2
#> 1       0       2       0       2
#> 2       1       2       0       2
#> 3       0       0       0       2
#> 4       0       1       0       2
#> 5       0       2       0       2