Skip to contents

This helper function helps filter a genotype matrix by removing markers or individuals that exceed a specified threshold. It returns a summary of removed markers or individuals and the filtered genotype matrix along with missing data proportions.

Usage

filter_missing_geno(
  geno_matrix,
  threshold = 0.1,
  filter_by = c("markers", "individuals")
)

Arguments

geno_matrix

A numeric matrix where:

  • Rows represent genetic markers.

  • Columns represent individuals.

  • NA values indicate missing genotype data.

threshold

Numeric. The maximum proportion of missing data allowed before a marker or individual is removed. Default is 0.10 (10% missing data).

filter_by

Character. Specifies whether to filter "markers" (rows) or "individuals" (columns). Must be either of "markers" or "individuals". Default is "markers".

Value

A list with the following objects:

  • "filtered_geno": The genotype matrix after filtering.

  • "pct_missing": A named numeric vector containing the missing data proportions for remaining markers or individuals.

  • "removed_individuals": A log of removed individuals (if filter_by = "individuals").

  • "removed_markers": A log of removed markers (if filter_by = "markers").

Details

  • Ensures the output retains matrix structure.

  • Prints a summary message showing how many markers or individuals were removed.

Examples

# Example genotype matrix with missing values
geno_data <- matrix(c(0, 1, NA, 2, 0, 1, NA, 2, NA, NA, 0, 1),
                    nrow = 4, ncol = 3,
                    dimnames = list(c("Marker1", "Marker2", "Marker3", "Marker4"),
                                    c("Ind1", "Ind2", "Ind3")))

# Filter markers with more than 10% missing data
result_markers <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "markers")
#> 3 markers removed (Threshold: 0.1)
print(result_markers$filtered_geno)
#>         Ind1 Ind2 Ind3
#> Marker4    2    2    1

# Filter individuals with more than 10% missing data
result_individuals <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "individuals")
#> 3 individuals removed (Threshold: 0.1)
print(result_individuals$filtered_geno)
#>        
#> Marker1
#> Marker2
#> Marker3
#> Marker4

# Example use
# result <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "individuals")

# Access output
# filtered_geno <- result$filtered_geno
# missing_values <- result$missing_vector
# removed <- result$removed