Skip to contents

This function calculates differential methylation between specified case and control groups using various statistical methods. The results are stored in a DuckDB database for further analysis.

Usage

calc_mod_diff(
  mod_db,
  call_type = "positions",
  output_table = NULL,
  cases,
  controls,
  mod_type = "mh",
  calc_type = NULL,
  temp_dir = tempdir(),
  threads = NULL,
  memory_limit = NULL,
  min_coverage = NULL,
  overwrite = TRUE
)

Arguments

mod_db

A list containing the database file path. This should be a valid "mod_db" class object.

call_type

A string representing the name of the table in the database from which to pull the data. Default is "positions".

output_table

Destination table name for results. If NULL, defaults to paste0("mod_diff_", call_type).

cases

A character vector containing the sample names for the case group.

controls

A character vector containing the sample names for the control group.

mod_type

A string indicating the type of modification to analyze. Default is "mh" for methylation/hydroxymethylation. Other codes include "a" for 6mA, "17596" for inosine, and "17802" for pseudouridine. Bare numeric codes are automatically prefixed with "m_".

calc_type

A string specifying the statistical method to use. Options: "wilcox", "beta_bin", "fast_fisher", "r_fisher", "log_reg". Default is NULL, in which case:

  • "wilcox" if both groups have >= 5 samples

  • "beta_bin" if both groups have >= 2 samples (accounts for overdispersion)

  • "fast_fisher" if either group has only 1 sample

temp_dir

Directory for DuckDB temporary files (default tempdir()).

threads

Integer DuckDB thread count. If NULL, an internal heuristic (typically all-but-one core) is used.

memory_limit

DuckDB memory limit string (e.g. "16384MB"). If NULL, an internal heuristic (~80% of RAM) is used.

min_coverage

Minimum fraction of positions within a window that must have modification calls for each sample (0 to 1). Computed as num_sites / (end - start + 1) per sample per window. Windows where any sample falls below this threshold are dropped before testing. For example, min_coverage = 0.5 on a 1kb window requires at least 500 sites covered per sample. Only applies when the input table contains num_sites, start, and end columns (i.e. windows). Default is NULL (no filtering).

overwrite

If TRUE and output_table exists, it is dropped before writing.

Value

A list containing the updated "mod_db" object with the latest tables in the database, including "meth_diff".

Details

The function connects to the specified DuckDB database and retrieves methylation data from the specified call type table. It summarizes the data for cases and controls, calculates p-values based on the specified method, and stores the results in the "meth_diff" table. Resource pragmas (temp_directory, threads, memory_limit) are set via internal heuristics unless overridden.

Examples