Skip to contents

recode is the heart of the package. This powerful function phases genotype marker data based on two parental references (parent1 and parent2).

It phases markers according to parental allele inheritance. Optionally, it can code phased markers into numeric (0, 1, 2) or character ("A", "B", "H") formats. Numeric coding is recommended for downstream analyses.

Usage

recode(
  geno,
  parent1,
  parent2,
  numeric_output = TRUE,
  handle_het_markers = FALSE,
  het_marker_types = NULL
)

Arguments

geno

A genotype matrix or data frame where markers are rows and individuals are columns.

parent1

Character. The name of the column representing the first parent.

parent2

Character. The name of the column representing the second parent.

numeric_output

Logical. If TRUE, converts phased markers to numeric dosage values (A = 0, H = 1, B = 2). Default is TRUE.

handle_het_markers

Logical. If TRUE, allows heterozygous parent markers to be included. Default is FALSE.

het_marker_types

Character vector. Specifies which heterozygous markers to keep when handle_het_markers = TRUE. Options include "AxH", "HxB", "HxA", "BxH". Default is NULL, meaning all homozygous markers are kept. See details.

Value

A data frame containing phased genotype markers where:

  • "0" represents alleles inherited from parent1.

  • "2" represents alleles inherited from parent2.

  • "1" represents heterozygous alleles.

  • If numeric_output = FALSE, 0, 2, 1, are replaced by "A", "B", "H" respectively.

  • numeric_output = TRUE is recommended.

Details

  • Drops markers where either parent has NA.

  • Removes non-polymorphic markers (markers where both parents have the same genotype).

  • If handle_het_markers = FALSE, retains only homozygous marker where dosages are as follows: (P1 = 0 & P2 = 2 or P1 = 2 & P2 = 0)

  • If handle_het_markers = TRUE, allows heterozygous markers to be kept.

  • Ensures that parent1 is always 0 and specific heterozygous markers to be kept.

  • Ensures that parent1 is always 0 and parent2 is always 2 for standardization.

  • Returns a numeric matrix if numeric_output = TRUE, otherwise returns phased "A", "B", "H" values.

  • More details on the heterozygous F2 marker types ("AxH", "HxB", "HxA", "BxH") are in Braun et al. (2017).

Note

This function was refined with assistance from ChatGPT to improve clarity, efficiency, and visualization formatting. Extensive testing was performed by the author to verify outputs.

Examples

# Example genotype data
geno_data <- data.frame(
  Marker1 = c(0, 1, 2, 0, 2),
  Marker2 = c(2, 0, 2, 1, 0),
  Parent1 = c(0, 2, 2, 0, 2),
  Parent2 = c(2, 0, 0, 2, 0)
)

# Recode genotype markers (default: numeric output)
phased_geno <- recode(geno_data, "Parent1", "Parent2")
print(phased_geno)
#>   Marker1 Marker2 Parent1 Parent2
#> 1       0       2       0       2
#> 2       1       2       0       2
#> 3       0       0       0       2
#> 4       0       1       0       2
#> 5       0       2       0       2

# Recode genotype markers with heterozygous marker handling
phased_geno_het <- recode(geno_data, "Parent1", "Parent2",
                          handle_het_markers = TRUE,
                          het_marker_types = c("AxH", "HxB"))
print(phased_geno_het)
#>   Marker1 Marker2 Parent1 Parent2
#> 1       0       2       0       2
#> 2       1       2       0       2
#> 3       0       0       0       2
#> 4       0       1       0       2
#> 5       0       2       0       2