This function calculates differential methylation between specified case and control groups using various statistical methods. The results are stored in a DuckDB database for further analysis.
Usage
calc_mod_diff(
mod_db,
call_type = "positions",
output_table = NULL,
cases,
controls,
mod_type = "mh",
calc_type = NULL,
temp_dir = tempdir(),
threads = NULL,
memory_limit = NULL,
min_coverage = NULL,
overwrite = TRUE
)Arguments
- mod_db
A list containing the database file path. This should be a valid "mod_db" class object.
- call_type
A string representing the name of the table in the database from which to pull the data. Default is "positions".
- output_table
Destination table name for results. If NULL, defaults to paste0("mod_diff_", call_type).
- cases
A character vector containing the sample names for the case group.
- controls
A character vector containing the sample names for the control group.
- mod_type
A string indicating the type of modification to analyze. Default is "mh" for methylation/hydroxymethylation. Other codes include "a" for 6mA, "17596" for inosine, and "17802" for pseudouridine. Bare numeric codes are automatically prefixed with "m_".
- calc_type
A string specifying the statistical method to use. Options: "wilcox", "beta_bin", "fast_fisher", "r_fisher", "log_reg". Default is NULL, in which case:
"wilcox" if both groups have >= 5 samples
"beta_bin" if both groups have >= 2 samples (accounts for overdispersion)
"fast_fisher" if either group has only 1 sample
- temp_dir
Directory for DuckDB temporary files (default
tempdir()).- threads
Integer DuckDB thread count. If
NULL, an internal heuristic (typically all-but-one core) is used.- memory_limit
DuckDB memory limit string (e.g.
"16384MB"). IfNULL, an internal heuristic (~80% of RAM) is used.- min_coverage
Minimum fraction of positions within a window that must have modification calls for each sample (0 to 1). Computed as
num_sites / (end - start + 1)per sample per window. Windows where any sample falls below this threshold are dropped before testing. For example,min_coverage = 0.5on a 1kb window requires at least 500 sites covered per sample. Only applies when the input table containsnum_sites,start, andendcolumns (i.e. windows). Default isNULL(no filtering).- overwrite
If TRUE and output_table exists, it is dropped before writing.
Value
A list containing the updated "mod_db" object with the latest tables in the database, including "meth_diff".
Details
The function connects to the specified DuckDB database and retrieves methylation data from the specified call type table.
It summarizes the data for cases and controls, calculates p-values based on the specified method, and stores the results in the
"meth_diff" table. Resource pragmas (temp_directory, threads,
memory_limit) are set via internal heuristics unless overridden.