
Filter Markers or Individuals Based on Missing Genotype Data
Source:R/filter_missing_geno.R
filter_missing_geno.Rd
This helper function helps filter a genotype matrix by removing markers or individuals that exceed a specified threshold. It returns a summary of removed markers or individuals and the filtered genotype matrix along with missing data proportions.
Usage
filter_missing_geno(
geno_matrix,
threshold = 0.1,
filter_by = c("markers", "individuals")
)
Arguments
- geno_matrix
A numeric matrix where:
Rows represent genetic markers.
Columns represent individuals.
NA
values indicate missing genotype data.
- threshold
Numeric. The maximum proportion of missing data allowed before a marker or individual is removed. Default is
0.10
(10% missing data).- filter_by
Character. Specifies whether to filter
"markers"
(rows) or"individuals"
(columns). Must be either of"markers"
or"individuals"
. Default is"markers"
.
Value
A list with the following objects:
"filtered_geno"
: The genotype matrix after filtering."pct_missing"
: A named numeric vector containing the missing data proportions for remaining markers or individuals."removed_individuals"
: A log of removed individuals (iffilter_by = "individuals"
)."removed_markers"
: A log of removed markers (iffilter_by = "markers"
).
Details
Ensures the output retains matrix structure.
Prints a summary message showing how many markers or individuals were removed.
Examples
# Example genotype matrix with missing values
geno_data <- matrix(c(0, 1, NA, 2, 0, 1, NA, 2, NA, NA, 0, 1),
nrow = 4, ncol = 3,
dimnames = list(c("Marker1", "Marker2", "Marker3", "Marker4"),
c("Ind1", "Ind2", "Ind3")))
# Filter markers with more than 10% missing data
result_markers <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "markers")
#> 3 markers removed (Threshold: 0.1)
print(result_markers$filtered_geno)
#> Ind1 Ind2 Ind3
#> Marker4 2 2 1
# Filter individuals with more than 10% missing data
result_individuals <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "individuals")
#> 3 individuals removed (Threshold: 0.1)
print(result_individuals$filtered_geno)
#>
#> Marker1
#> Marker2
#> Marker3
#> Marker4
# Example use
# result <- filter_missing_geno(geno_data, threshold = 0.10, filter_by = "individuals")
# Access output
# filtered_geno <- result$filtered_geno
# missing_values <- result$missing_vector
# removed <- result$removed