Classify Reads as Case or Control Based on Methylation Profiles
Source:R/classify_mod_reads.R
classify_mod_reads.RdThis function classifies reads in the `reads` table of a DuckDB database as either `"case"`, `"control"`, or `"unknown"` based on similarity to reference methylation fractions in a `key_table` (e.g., a collapsed windows file). Classification is based on how close the read's `mh_frac` is to the average case or control `mh_frac`, within a user-defined `meth_diff_threshold`.
Usage
classify_mod_reads(
mod_db,
table_name = "classified_reads",
reads_table = "reads",
key_table,
case,
control,
meth_diff_threshold = 0.1
)Arguments
- mod_db
Path to a DuckDB `.db` file created by this package (e.g., from `summarize_reads()`).
- table_name
Character. Name of the output table to store classified reads (default: "classified_reads").
- reads_table
Character. Name of the reads table in which you want to classify the reads (default: "reads").
- key_table
Path to a CSV, TSV, or BED file generated by `collapse_windows()`. Must include the columns: `chrom`, `start`, `end`, `avg_mh_frac_control`, `avg_mh_frac_case`, and `avg_meth_diff`.
- case
Character string used to label case reads (e.g., `"case"`).
- control
Character string used to label control reads (e.g., `"control"`).
- meth_diff_threshold
Numeric value specifying the maximum difference in `mh_frac` allowed to match either case or control averages. Must be less than half the minimum absolute value of `avg_meth_diff` to prevent ambiguous classifications.
Value
Invisibly returns the open database connection with a new table named `classified_reads` added to the database. This table includes:
sample_name– Sample identifierread_id– Unique read identifierfirst_cpg_pos– First CpG position of the readlast_cpg_pos– Last CpG position of the readmh_frac– Methylation fraction of the readclassification– `"case"`, `"control"`, or `"unknown"`