| Title: | Perform Preprocessing on Glycomics and Glycoproteomics Data |
|---|---|
| Description: | Provides comprehensive data preprocessing tools for glycomics and glycoproteomics experiments. The package offers both automated and manual preprocessing pipelines, including normalization (median, quantile, vsn, total area), missing value handling (filtering, imputation using various methods), batch effect correction (using ComBat and other methods), data aggregation (at peptide, glycan, or site level), and quality control functions. The automated pipeline intelligently selects appropriate methods based on data characteristics and sample size, making it easy to prepare clean, analysis-ready data for downstream statistical analysis within the glycoverse ecosystem. Works seamlessly with 'glyexp::experiment()' objects to ensure data consistency and interoperability. |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.14.1 |
| Built: | 2026-05-18 07:01:38 UTC |
| Source: | https://github.com/glycoverse/glyclean |
This function adds a new "site_sequence" column to the variable information table. It is the protein sequence around the glycosylation site. If the head and tail amino acids of the peptide sequence are insufficient, fill with "X".
This function requires the following columns in the variable information tibble:
"protein": The protein uniprot accession.
"protein_site": The site on the protein sequence.
add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606) ## S3 method for class 'glyexp_experiment' add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606) ## Default S3 method: add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606)add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606) ## S3 method for class 'glyexp_experiment' add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606) ## Default S3 method: add_site_seq(exp, fasta = NULL, n_aa = 7, taxid = 9606)
exp |
A |
fasta |
Either a file path to a FASTA file, a named character vector
with protein IDs as names and sequences as value, or |
n_aa |
The number of amino acids to the left and right of the glycosylation site.
For example, if |
taxid |
NCBI taxonomy ID for UniProt lookup. Default: |
A glyexp::experiment() object with the new "site_sequence" column.
This function adjusts the glycopeptide expression by the protein expression. In another word, it "strips out" the protein expression from the glycopeptide expression, so that the expression reflects the glycosylation status only.
adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg")) ## S3 method for class 'glyexp_experiment' adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg")) ## Default S3 method: adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg"))adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg")) ## S3 method for class 'glyexp_experiment' adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg")) ## Default S3 method: adjust_protein(exp, pro_expr_mat, method = c("ratio", "reg"))
exp |
A |
pro_expr_mat |
A matrix of protein expression. Columns are samples, rows are uniprot protein accessions. |
method |
The method to use for protein adjustment. Either "ratio" or "reg". Default is "ratio". |
For simplicity, glycopeptide expression is denoted as GP,
and protein expression is denoted as PRO.
GP-adj = (GP / PRO) / (median(GP) / median(PRO))
The first part is to adjust glycopeptide expression by protein expression. The second part is to rescale the expression to the original scale.
Use linear regression to remove the protein expression from the glycopeptide expression. Both glycopeptide and protein expression values are log2-transformed (with +1 to avoid log(0)) before fitting the linear model: log2(GP+1) ~ log2(PRO+1). The residuals represent the glycosylation-specific signal and are converted back to the original scale using 2^residuals, ensuring all adjusted values are positive.
In both methods, only glycoproteins identified in both exp and pro_expr_mat will be retained.
A glyexp::experiment() object with adjusted protein expression.
Aggregate glycoproteomics data to different levels (glycoforms, glycopeptides, etc.). This function sums up quantitative values for each unique combination of specified variables. It is recommended to call this function after missing value imputation.
The following levels are available:
"gf": Aggregate to glycoforms, which is the unique combination of proteins, protein sites, and glycan compositions. This is the default level.
"gp": Aggregate to glycopeptides, which is the unique combination of peptides, proteins, protein sites, and glycan compositions.
"gfs": Like "gf", but differentiates structures with the same composition.
"gps": Like "gp", but differentiates structures with the same composition.
Different levels of aggregation require different columns in the variable information.
"gf": "protein", "glycan_composition", "protein_site"
"gp": "peptide", "protein", "glycan_composition", "peptide_site", "protein_site"
"gfs": "protein", "glycan_composition", "glycan_structure", "protein_site"
"gps": "peptide", "protein", "glycan_composition", "glycan_structure", "peptide_site", "protein_site"
Other columns in the variable information tibble with "many-to-one" relationship with the unique combination of columns listed above will be kept. A common example is the "gene" column. For each "glycoform" (unique combination of "protein", "protein_site", and "glycan_composition"), there should be only one "gene" value, therefore it is kept for "gf" level. On the other hand, the "peptide" column is removed for "gf" level, as one "glycoform" can contain multiple "peptides".
aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE ) ## S3 method for class 'glyexp_experiment' glyclean_aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE ) ## Default S3 method: glyclean_aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE )aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE ) ## S3 method for class 'glyexp_experiment' glyclean_aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE ) ## Default S3 method: glyclean_aggregate( exp, to_level = c("gf", "gp", "gfs", "gps"), standardize_variable = TRUE )
exp |
A |
to_level |
The aggregation level, one of: "gf" (glycoforms), "gp" (glycopeptides), "gfs" (glycoforms with structures), or "gps" (glycopeptides with structures). See Details for more information. |
standardize_variable |
Whether to call |
A modified glyexp::experiment() object with aggregated expression matrix and
updated variable information.
Aggregates glycoproteomics data to "gfs" (glycoforms with structures) level if the glycan structure column exists, otherwise to "gf" (glycoforms with compositions) level.
auto_aggregate(exp, standardize_variable = TRUE)auto_aggregate(exp, standardize_variable = TRUE)
exp |
A |
standardize_variable |
Whether to call |
A modified glyexp::experiment() object with aggregated expression matrix and
updated variable information.
library(glyexp) exp <- real_experiment auto_aggregate(exp)library(glyexp) exp <- real_experiment auto_aggregate(exp)
Perform automatic data preprocessing on glycoproteomics or glycomics data. This function applies an intelligent preprocessing pipeline that includes normalization, missing value handling, imputation, aggregation (for glycoproteomics data), and batch effect correction.
For glycomics data, this function calls these functions in sequence:
For glycoproteomics data, this function calls these functions in sequence:
auto_clean( exp, group_col = "group", batch_col = "batch", qc_name = "QC", normalize_to_try = NULL, impute_to_try = NULL, remove_preset = "discovery", batch_prop_threshold = 0.3, check_batch_confounding = TRUE, batch_confounding_threshold = 0.4, standardize_variable = TRUE )auto_clean( exp, group_col = "group", batch_col = "batch", qc_name = "QC", normalize_to_try = NULL, impute_to_try = NULL, remove_preset = "discovery", batch_prop_threshold = 0.3, check_batch_confounding = TRUE, batch_confounding_threshold = 0.4, standardize_variable = TRUE )
exp |
A |
group_col |
The column name in sample_info for groups. Default is "group". Can be NULL when no group information is available. |
batch_col |
The column name in sample_info for batches. Default is "batch". Can be NULL when no batch information is available. |
qc_name |
|
normalize_to_try |
|
impute_to_try |
|
remove_preset |
The preset for removing variables. Default is "discovery". Available presets:
|
batch_prop_threshold |
The proportion of variables that must show significant batch effects to perform batch correction. Default is 0.3 (30%). |
check_batch_confounding |
Whether to check for confounding between batch and group variables. Default to TRUE. |
batch_confounding_threshold |
The threshold for Cramer's V to consider batch and group variables highly confounded.
Only used when |
standardize_variable |
Whether to call |
A modified glyexp::experiment() object.
auto_normalize(), auto_remove(), auto_impute(), auto_aggregate(), auto_correct_batch_effect()
library(glyexp) exp <- real_experiment auto_clean(exp)library(glyexp) exp <- real_experiment auto_clean(exp)
If the data has more than 50 variables, call transform_alr().
Otherwise, call transform_clr(),
following the same strategy in glycowork.
auto_coda(x, by = NULL, gamma = 0.1, group_scales = NULL)auto_coda(x, by = NULL, gamma = 0.1, group_scales = NULL)
x |
Either a |
by |
Either a column name in |
gamma |
Standard deviation of the scale-uncertainty model on the |
group_scales |
Optional informed group scales. For binary comparisons, this can be a single positive ratio for the second group relative to the first, or two positive scales from which that ratio is derived. For multi-group data, provide a positive vector with one scale per group. |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with a CoDA-transformed expression matrix
(ALR if >50 variables, CLR otherwise).
The stochastic branch follows glycowork: when gamma > 0, CLR noise is
sampled per feature-sample entry rather than once per sample. Without
informed scales, this corresponds to jittering around the sample-specific
log2 geometric mean of the non-zero components and then returning to ratio
space.
The group_scales argument is interpreted in the same way as
glycowork's custom_scale. For exactly two groups, provide the total-signal
ratio of the second group relative to the first, either directly as a scalar
or implicitly through two per-group scales. For multi-group data, provide one
positive scale per group; these scales are used only in the stochastic branch.
If you intend to use ALR/CLR transformation with motif quantification, you should apply the transformation after motif quantification rather than before. In another words, the recommended workflow is:
Perform data preprocessing.
clean_exp <- auto_clean(exp)
Perform motif quantification on the cleaned experiment.
motif_exp <- glydet::quantify_motifs(clean_exp, motifs)
Perform ALR/CLR transformation on the motif-quantified experiment.
coda_motif_exp <- auto_coda(motif_exp, by = "group", gamma = 0.1)
Proceed with downstream analyses on the transformed motif experiment.
dea_res <- glystats::gly_ttest(coda_motif_exp)
Detects and corrects batch effects in the experiment.
If batch information is available,
this function performs ANOVA to detect batch effects.
If more than 30% (controlled by prop_threshold) of variables show significant batch effects (p < 0.05),
batch correction is performed using ComBat.
If group information exists,
it will be used as a covariate in both detection and correction
to preserve biological variation.
If no batch information is available,
the function will return the original experiment.
auto_correct_batch_effect( exp, group_col = "group", batch_col = "batch", prop_threshold = 0.3, check_confounding = TRUE, confounding_threshold = 0.4 )auto_correct_batch_effect( exp, group_col = "group", batch_col = "batch", prop_threshold = 0.3, check_confounding = TRUE, confounding_threshold = 0.4 )
exp |
A |
group_col |
The column name in sample_info for groups. Default is "group". Can be NULL when no group information is available. |
batch_col |
The column name in sample_info for batches. Default is "batch". Can be NULL when no batch information is available. |
prop_threshold |
The proportion of variables that must show significant batch effects to perform batch correction. Default is 0.3 (30%). |
check_confounding |
Whether to check for confounding between batch and group variables. Default to TRUE. |
confounding_threshold |
The threshold for Cramer's V to consider batch and group variables highly confounded.
Only used when |
A glyexp::experiment() object with batch effects corrected.
exp <- glyexp::real_experiment exp <- auto_correct_batch_effect(exp)exp <- glyexp::real_experiment exp <- auto_correct_batch_effect(exp)
This function automatically selects and applies a deterministic default imputation method by sample count and experiment type. Quality Control (QC) samples are inspected for workflow consistency, but they are not used to benchmark or select the imputation method.
auto_impute(exp, group_col = "group", qc_name = "QC", to_try = NULL)auto_impute(exp, group_col = "group", qc_name = "QC", to_try = NULL)
The automatic strategy uses these defaults:
n_samples < 30: impute_min_prob() for glycomics and glycoproteomics.
30 <= n_samples <= 100: impute_bpca() for glycomics and
impute_min_prob() for glycoproteomics.
n_samples > 100: impute_miss_forest() for glycomics and
impute_bpca() for glycoproteomics.
Other experiment types use the glycoproteomics defaults as a conservative
fallback. impute_sample_min() and impute_half_sample_min() remain
available for manual use, but they are not selected automatically.
The imputed experiment.
library(glyexp) exp_imputed <- auto_impute(real_experiment)library(glyexp) exp_imputed <- auto_impute(real_experiment)
This function automatically applies a deterministic default normalization method based on experiment type. Quality Control (QC) samples are not used to benchmark, select, or report normalization behavior.
auto_normalize(exp, group_col = "group", qc_name = "QC", to_try = NULL)auto_normalize(exp, group_col = "group", qc_name = "QC", to_try = NULL)
The automatic strategy uses these defaults:
glycomics: normalize_total_area().
glycoproteomics: normalize_median().
missing or other experiment types: normalize_median().
The normalized experiment.
library(glyexp) exp_normed <- auto_normalize(real_experiment)library(glyexp) exp_normed <- auto_normalize(real_experiment)
This function uses preset rules to remove variables with low quality. Available presets:
"simple": remove variables with more than 50% missing values.
"discovery": more lenient, remove variables with more than 80% missing values, but ensure less than 50% of missing values in at least one group.
"biomarker": more strict, remove variables with more than 40% missing values, and ensure less than 60% of missing values in all groups.
auto_remove(exp, preset = "discovery", group_col = "group", qc_name = "QC")auto_remove(exp, preset = "discovery", group_col = "group", qc_name = "QC")
A modified glyexp::experiment() object.
library(glyexp) exp <- real_experiment auto_remove(exp)library(glyexp) exp <- real_experiment auto_remove(exp)
Correct batch effects in glycoproteomics/glycomics data using ComBat algorithm from the sva package or removeBatchEffect from the limma package.
correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## S3 method for class 'glyexp_experiment' correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## S3 method for class 'matrix' correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## Default S3 method: correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") )correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## S3 method for class 'glyexp_experiment' correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## S3 method for class 'matrix' correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") ) ## Default S3 method: correct_batch_effect( x, batch = "batch", group = NULL, check_confounding = TRUE, confounding_threshold = 0.4, method = c("combat", "limma") )
x |
Either a |
batch |
Either a factor/character vector specifying batch assignments for each sample, or a string specifying the column name in sample_info (for experiment input only). Default to "batch" for experiment input. |
group |
Either a factor/character vector specifying group assignments for each sample, or a string specifying the column name in sample_info (for experiment input only). If provided, it will be used as a covariate in the batch correction model. This is useful when you have an unbalanced design. Default to NULL. |
check_confounding |
Whether to check for confounding between batch and group variables. Default to TRUE. |
confounding_threshold |
The threshold for Cramer's V to consider batch and group variables highly confounded.
Only used when |
method |
The batch correction method to use. Either "combat" (default, uses sva::ComBat) or "limma" (uses limma::removeBatchEffect). Default to "combat". |
This function performs batch effect correction using either the ComBat algorithm
or the limma removeBatchEffect function. It requires batch information provided
via the batch parameter. If no batch information is available, the function
will return the original data unchanged.
If group information is provided via group,
the function will check for confounding between batch and group variables.
If batch and group are highly confounded (|Cramer's V| > threshold),
the function will issue a warning and return the original data unchanged
to avoid over-correction.
When both batch and group information are available and not highly confounded, the group information will be included in the model to preserve biological variation while correcting for batch effects.
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a batch-corrected matrix.
# With glyexp_experiment and column names exp <- glyexp::toy_experiment exp$sample_info$batch <- c("A", "A", "A", "B", "B", "B") exp$sample_info$group <- c("Ctrl", "Ctrl", "Treat", "Ctrl", "Treat", "Treat") corrected_exp <- correct_batch_effect(exp, batch = "batch", group = "group") # With matrix and factor vectors mat <- matrix(abs(rnorm(200)), nrow = 20, ncol = 10) batch_factor <- factor(rep(c("A", "B"), each = 5)) group_factor <- factor(rep(c("Ctrl", "Treat"), times = 5)) corrected_mat <- correct_batch_effect(mat, batch = batch_factor, group = group_factor) # Using limma method corrected_mat <- correct_batch_effect( mat, batch = batch_factor, group = group_factor, method = "limma" )# With glyexp_experiment and column names exp <- glyexp::toy_experiment exp$sample_info$batch <- c("A", "A", "A", "B", "B", "B") exp$sample_info$group <- c("Ctrl", "Ctrl", "Treat", "Ctrl", "Treat", "Treat") corrected_exp <- correct_batch_effect(exp, batch = "batch", group = "group") # With matrix and factor vectors mat <- matrix(abs(rnorm(200)), nrow = 20, ncol = 10) batch_factor <- factor(rep(c("A", "B"), each = 5)) group_factor <- factor(rep(c("Ctrl", "Treat"), times = 5)) corrected_mat <- correct_batch_effect(mat, batch = batch_factor, group = group_factor) # Using limma method corrected_mat <- correct_batch_effect( mat, batch = batch_factor, group = group_factor, method = "limma" )
Use ANOVA to detect if batch effect is present in the data.
If group is provided, it will be used as a covariate in the ANOVA model.
detect_batch_effect(x, batch = "batch", group = NULL) ## S3 method for class 'glyexp_experiment' detect_batch_effect(x, batch = "batch", group = NULL) ## S3 method for class 'matrix' detect_batch_effect(x, batch = "batch", group = NULL) ## Default S3 method: detect_batch_effect(x, batch = "batch", group = NULL)detect_batch_effect(x, batch = "batch", group = NULL) ## S3 method for class 'glyexp_experiment' detect_batch_effect(x, batch = "batch", group = NULL) ## S3 method for class 'matrix' detect_batch_effect(x, batch = "batch", group = NULL) ## Default S3 method: detect_batch_effect(x, batch = "batch", group = NULL)
x |
Either a |
batch |
Either a factor/character vector specifying batch assignments for each sample, or a string specifying the column name in sample_info (for experiment input only). Default to "batch" for experiment input. |
group |
Either a factor/character vector specifying group assignments for each sample, or a string specifying the column name in sample_info (for experiment input only). If provided, it will be used as a covariate in the ANOVA model. This is useful when you have an unbalanced design. Default to NULL. |
A double vector of p-values for each variable,
i.e., the same length as nrow(x) (for matrix) or nrow(get_expr_mat(x)) (for experiment)
# With glyexp_experiment and column names exp <- glyexp::toy_experiment exp$sample_info$batch <- c("A", "A", "A", "B", "B", "B") exp$sample_info$group <- c("Ctrl", "Ctrl", "Treat", "Ctrl", "Treat", "Treat") p_values <- detect_batch_effect(exp, batch = "batch", group = "group") # With matrix and factor vectors mat <- matrix(rnorm(200), nrow = 20, ncol = 10) batch_factor <- factor(rep(c("A", "B"), each = 5)) group_factor <- factor(rep(c("Ctrl", "Treat"), times = 5)) p_values <- detect_batch_effect(mat, batch = batch_factor, group = group_factor)# With glyexp_experiment and column names exp <- glyexp::toy_experiment exp$sample_info$batch <- c("A", "A", "A", "B", "B", "B") exp$sample_info$group <- c("Ctrl", "Ctrl", "Treat", "Ctrl", "Treat", "Treat") p_values <- detect_batch_effect(exp, batch = "batch", group = "group") # With matrix and factor vectors mat <- matrix(rnorm(200), nrow = 20, ncol = 10) batch_factor <- factor(rep(c("A", "B"), each = 5)) group_factor <- factor(rep(c("Ctrl", "Treat"), times = 5)) p_values <- detect_batch_effect(mat, batch = batch_factor, group = group_factor)
A wrapper around the pcaMethods::pca().
Impute missing values using Bayesian principal component analysis (BPCA).
BPCA combines an EM approach for PCA with a Bayesian model.
In standard PCA data far from the training set but close to the principal
subspace may have the same reconstruction error.
BPCA defines a likelihood function such that the likelihood for data far from
the training set is much lower, even if they are close to the principal subspace.
impute_bpca(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_bpca(x, by = NULL, ...) ## S3 method for class 'matrix' impute_bpca(x, by = NULL, ...) ## Default S3 method: impute_bpca(x, by = NULL, ...)impute_bpca(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_bpca(x, by = NULL, ...) ## S3 method for class 'matrix' impute_bpca(x, by = NULL, ...) ## Default S3 method: impute_bpca(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
A wrapper around the impute::impute.knn().
Impute missing values with values from the k-nearest neighbors of the
corresponding feature.
impute_fw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_fw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'matrix' impute_fw_knn(x, k = 5, by = NULL, ...) ## Default S3 method: impute_fw_knn(x, k = 5, by = NULL, ...)impute_fw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_fw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'matrix' impute_fw_knn(x, k = 5, by = NULL, ...) ## Default S3 method: impute_fw_knn(x, k = 5, by = NULL, ...)
x |
Either a |
k |
The number of nearest neighbors to consider. |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
Impute missing values with half of the minimum value of the corresponding sample.
This method assumes that missing values are MCAR,
i.e. missing values are induced by an ion below the detection limit.
Compared to impute_sample_min(), this method is more conservative.
impute_half_sample_min(x) ## S3 method for class 'glyexp_experiment' impute_half_sample_min(x) ## S3 method for class 'matrix' impute_half_sample_min(x) ## Default S3 method: impute_half_sample_min(x)impute_half_sample_min(x) ## S3 method for class 'glyexp_experiment' impute_half_sample_min(x) ## S3 method for class 'matrix' impute_half_sample_min(x) ## Default S3 method: impute_half_sample_min(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
Impute missing values using random draws from the left-censored gaussian distribution. Missing values are imputed on the log2 intensity scale and then transformed back to the original scale.
impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## S3 method for class 'glyexp_experiment' impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## S3 method for class 'matrix' impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## Default S3 method: impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...)impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## S3 method for class 'glyexp_experiment' impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## S3 method for class 'matrix' impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...) ## Default S3 method: impute_min_prob(x, by = NULL, q = 0.01, tune.sigma = 1, ...)
x |
Either a |
by |
Either a column name in |
q |
Quantile used to estimate the lower-intensity center for each
sample. Default is |
tune.sigma |
Non-negative multiplier for the standard deviation of the
left-censored draw distribution. Default is |
... |
Reserved for backward compatibility. Extra arguments are not supported. |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
A wrapper around the missForest::missForest().
Impute missing values using recursive running of random forests until convergence.
This is a non-parametric method and works for both MAR and MNAR missing data.
impute_miss_forest(x, by = NULL, seed = 123, ...) ## S3 method for class 'glyexp_experiment' impute_miss_forest(x, by = NULL, seed = 123, ...) ## S3 method for class 'matrix' impute_miss_forest(x, by = NULL, seed = 123, ...) ## Default S3 method: impute_miss_forest(x, by = NULL, seed = 123, ...)impute_miss_forest(x, by = NULL, seed = 123, ...) ## S3 method for class 'glyexp_experiment' impute_miss_forest(x, by = NULL, seed = 123, ...) ## S3 method for class 'matrix' impute_miss_forest(x, by = NULL, seed = 123, ...) ## Default S3 method: impute_miss_forest(x, by = NULL, seed = 123, ...)
x |
Either a |
by |
Either a column name in |
seed |
Integer seed for random number generation. Default is 123. |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
A wrapper around the pcaMethods::pca().
Impute missing values using probabilistic principal component analysis (PPCA).
PPCA allows to perform PCA on incomplete data and may be used for missing value estimation.
impute_ppca(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_ppca(x, by = NULL, ...) ## S3 method for class 'matrix' impute_ppca(x, by = NULL, ...) ## Default S3 method: impute_ppca(x, by = NULL, ...)impute_ppca(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_ppca(x, by = NULL, ...) ## S3 method for class 'matrix' impute_ppca(x, by = NULL, ...) ## Default S3 method: impute_ppca(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
Impute missing values with the minimum value of the corresponding sample.
This method assumes that missing values are MCAR,
i.e. missing values are induced by an ion below the detection limit.
See also impute_half_sample_min().
impute_sample_min(x) ## S3 method for class 'glyexp_experiment' impute_sample_min(x) ## S3 method for class 'matrix' impute_sample_min(x) ## Default S3 method: impute_sample_min(x)impute_sample_min(x) ## S3 method for class 'glyexp_experiment' impute_sample_min(x) ## S3 method for class 'matrix' impute_sample_min(x) ## Default S3 method: impute_sample_min(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
A wrapper around the pcaMethods::pca().
Impute missing values using singular value decomposition (SVD) imputation.
SVD is a matrix factorization technique that factors a matrix into three matrices:
U, Σ, and V. SVD is used to find the best lower rank approximation of the original matrix.
impute_svd(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_svd(x, by = NULL, ...) ## S3 method for class 'matrix' impute_svd(x, by = NULL, ...) ## Default S3 method: impute_svd(x, by = NULL, ...)impute_svd(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_svd(x, by = NULL, ...) ## S3 method for class 'matrix' impute_svd(x, by = NULL, ...) ## Default S3 method: impute_svd(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
A wrapper around the impute::impute.knn().
Impute missing values with values from the k-nearest neighbors of the
corresponding sample.
If there are strong patterns among the samples (such as group clustering
relationships or experimental conditions), this method can better utilize
the overall relationships among samples.
impute_sw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_sw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'matrix' impute_sw_knn(x, k = 5, by = NULL, ...) ## Default S3 method: impute_sw_knn(x, k = 5, by = NULL, ...)impute_sw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'glyexp_experiment' impute_sw_knn(x, k = 5, by = NULL, ...) ## S3 method for class 'matrix' impute_sw_knn(x, k = 5, by = NULL, ...) ## Default S3 method: impute_sw_knn(x, k = 5, by = NULL, ...)
x |
Either a |
k |
The number of nearest neighbors to consider. |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
Impute missing values in an expression matrix by replacing them with zeros.
impute_zero(x) ## S3 method for class 'glyexp_experiment' impute_zero(x) ## S3 method for class 'matrix' impute_zero(x) ## Default S3 method: impute_zero(x)impute_zero(x) ## S3 method for class 'glyexp_experiment' impute_zero(x) ## S3 method for class 'matrix' impute_zero(x) ## Default S3 method: impute_zero(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with missing values imputed.
If x is a matrix, returns a matrix with missing values imputed.
This function is a wrapper around limma::normalizeCyclicLoess() with
method = "pairs".
Each pair of columns is normalized mutually to each other.
See this paper for more information.
Also see limma::normalizeCyclicLoess().
normalize_loesscyc(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_loesscyc(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_loesscyc(x, by = NULL, ...) ## Default S3 method: normalize_loesscyc(x, by = NULL, ...)normalize_loesscyc(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_loesscyc(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_loesscyc(x, by = NULL, ...) ## Default S3 method: normalize_loesscyc(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This function is a wrapper around limma::normalizeCyclicLoess() with
method = "fast".
Each column is simply normalized to a reference array,
the reference array being the average of all the arrays.
See this paper for more information.
Also see limma::normalizeCyclicLoess().
normalize_loessf(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_loessf(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_loessf(x, by = NULL, ...) ## Default S3 method: normalize_loessf(x, by = NULL, ...)normalize_loessf(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_loessf(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_loessf(x, by = NULL, ...) ## Default S3 method: normalize_loessf(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
Normalize the expression matrix by dividing each column (sample) by the median of that column (NA ignored), so that the median of each column is 1. This is the most common normalization method for proteomics data. It effectively and robustly removes the bias introduced by total protein abundance, and removes batch effects in part.
normalize_median(x) ## S3 method for class 'glyexp_experiment' normalize_median(x) ## S3 method for class 'matrix' normalize_median(x) ## Default S3 method: normalize_median(x)normalize_median(x) ## S3 method for class 'glyexp_experiment' normalize_median(x) ## S3 method for class 'matrix' normalize_median(x) ## Default S3 method: normalize_median(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
Normalize the expression matrix by dividing each column (sample) by the absolute median of that column (NA ignored), so that the absolute median of each column is 1.
normalize_median_abs(x) ## S3 method for class 'glyexp_experiment' normalize_median_abs(x) ## S3 method for class 'matrix' normalize_median_abs(x) ## Default S3 method: normalize_median_abs(x)normalize_median_abs(x) ## S3 method for class 'glyexp_experiment' normalize_median_abs(x) ## S3 method for class 'matrix' normalize_median_abs(x) ## Default S3 method: normalize_median_abs(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This approach is based on the calculation of the dilution factor of each sample with respect to a reference sample. Here, the reference sample was calculated as the median value of each glycan's abundance across all measured samples. For each sample, a vector of quotients was then obtained by dividing each glycan measure by the corresponding value in the reference sample. The median of these quotients was then used as the sample's dilution factor, and the original sample values were subsequently divided by that value. The underlying assumption is that the different intensities observed across individuals are imputable to different amounts of the biological material in the collected samples. See this paper for more information.
normalize_median_quotient(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_median_quotient(x, by = NULL) ## S3 method for class 'matrix' normalize_median_quotient(x, by = NULL) ## Default S3 method: normalize_median_quotient(x, by = NULL)normalize_median_quotient(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_median_quotient(x, by = NULL) ## S3 method for class 'matrix' normalize_median_quotient(x, by = NULL) ## Default S3 method: normalize_median_quotient(x, by = NULL)
x |
Either a |
by |
Either a column name in |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This function is a wrapper around limma::normalizeQuantiles().
It normalizes the expression matrix by quantile normalization.
This method is used to remove technical variation between samples.
Proteomics data rarely uses this method, but it is common in microarray data.
See wikipedia
for more information.
normalize_quantile(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_quantile(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_quantile(x, by = NULL, ...) ## Default S3 method: normalize_quantile(x, by = NULL, ...)normalize_quantile(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_quantile(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_quantile(x, by = NULL, ...) ## Default S3 method: normalize_quantile(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This method is based on robust linear regression. The reference sample is calculated as the median value of each variable's abundance across all measured samples. For each sample, a robust linear regression model is fitted to the sample's abundance values against the reference sample's abundance values. The fitted model is then used to normalize the sample's abundance values. The underlying assumption is that the different intensities observed across individuals are imputable to different amounts of the biological material in the collected samples.
normalize_rlr(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlr(x, by = NULL) ## S3 method for class 'matrix' normalize_rlr(x, by = NULL) ## Default S3 method: normalize_rlr(x, by = NULL)normalize_rlr(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlr(x, by = NULL) ## S3 method for class 'matrix' normalize_rlr(x, by = NULL) ## Default S3 method: normalize_rlr(x, by = NULL)
x |
Either a |
by |
Either a column name in |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This method is based on robust linear regression with median adjustment.
First, the median of each variable's abundance across all measured samples
is subtracted from the sample's abundance values. Then, it's like the
normalize_rlr() method.
normalize_rlrma(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlrma(x, by = NULL) ## S3 method for class 'matrix' normalize_rlrma(x, by = NULL) ## Default S3 method: normalize_rlrma(x, by = NULL)normalize_rlrma(x, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlrma(x, by = NULL) ## S3 method for class 'matrix' normalize_rlrma(x, by = NULL) ## Default S3 method: normalize_rlrma(x, by = NULL)
x |
Either a |
by |
Either a column name in |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This method is based on robust linear regression with median adjustment and cyclic normalization. The method is applied iteratively to each pair of samples. For each pair of samples, the median of the differences between the two samples is calculated. A robust linear regression model is fitted to the differences against the averages of the two samples. The fitted model is then used to normalize the two samples. The process is repeated for a number of iterations.
normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## S3 method for class 'matrix' normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## Default S3 method: normalize_rlrmacyc(x, n_iter = 3, by = NULL)normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## S3 method for class 'glyexp_experiment' normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## S3 method for class 'matrix' normalize_rlrmacyc(x, n_iter = 3, by = NULL) ## Default S3 method: normalize_rlrmacyc(x, n_iter = 3, by = NULL)
x |
Either a |
n_iter |
The number of iterations to perform. Default is 3. |
by |
Either a column name in |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
Normalize the expression matrix by dividing each column (sample) by the sum of that column, so that the sum of each column is 1. This is the most common normalization method for glycomics data. It removes the bias introduced by total glycan abundance. However, it results in compositional data, which may result in unrealistic downstream analysis results.
normalize_total_area(x) ## S3 method for class 'glyexp_experiment' normalize_total_area(x) ## S3 method for class 'matrix' normalize_total_area(x) ## Default S3 method: normalize_total_area(x)normalize_total_area(x) ## S3 method for class 'glyexp_experiment' normalize_total_area(x) ## S3 method for class 'matrix' normalize_total_area(x) ## Default S3 method: normalize_total_area(x)
x |
Either a |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
This function is a wrapper around limma::normalizeVSN().
Evidence proved that this method performs well in reducing noise and
boosting differential expression detection.
Log-transformation is not needed for downstream statistical analysis,
for this normalization method performs a log-like transformation internally.
Due to the same reason, fold change estimates will be severely distorted.
Please use this method with caution.
See this paper for more information.
normalize_vsn(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_vsn(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_vsn(x, by = NULL, ...) ## Default S3 method: normalize_vsn(x, by = NULL, ...)normalize_vsn(x, by = NULL, ...) ## S3 method for class 'glyexp_experiment' normalize_vsn(x, by = NULL, ...) ## S3 method for class 'matrix' normalize_vsn(x, by = NULL, ...) ## Default S3 method: normalize_vsn(x, by = NULL, ...)
x |
Either a |
by |
Either a column name in |
... |
Additional arguments to pass to |
At least 42 variables should be provided for this method.
This follows the convention of the vsn package.
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with normalized expression matrix.
If x is a matrix, returns a normalized matrix.
Draw a PCA score plot for samples and color points by batch. PCA is computed on log2-transformed intensities after removing variables with missing values.
plot_batch_pca(exp, batch_col = "batch")plot_batch_pca(exp, batch_col = "batch")
exp |
A |
batch_col |
Column name in |
A ggplot object of PCA scores.
exp <- glyexp::toy_experiment exp$sample_info$batch <- rep(c("A", "B"), each = 3) plot_batch_pca(exp, batch_col = "batch")exp <- glyexp::toy_experiment exp$sample_info$batch <- rep(c("A", "B"), each = 3) plot_batch_pca(exp, batch_col = "batch")
Compute coefficient of variation (CV) for each variable and plot its density.
When by is provided, CVs are computed within each group and densities are
shown with different fills.
plot_cv_dent(exp, by = NULL)plot_cv_dent(exp, by = NULL)
exp |
A |
by |
Grouping variable for samples. Can be a column name in |
A ggplot object of CV density.
plot_cv_dent(glyexp::toy_experiment) exp <- glyexp::toy_experiment exp$sample_info$group <- rep(c("A", "B"), length.out = ncol(exp$expr_mat)) plot_cv_dent(exp, by = "group")plot_cv_dent(glyexp::toy_experiment) exp <- glyexp::toy_experiment exp$sample_info$group <- rep(c("A", "B"), length.out = ncol(exp$expr_mat)) plot_cv_dent(exp, by = "group")
Draw boxplots of log2-transformed intensities for each sample. Optionally color and group samples by a metadata variable.
plot_int_boxplot(exp, by = NULL)plot_int_boxplot(exp, by = NULL)
exp |
A |
by |
Grouping variable for samples. Can be a column name in |
A ggplot object of log-intensity boxplots.
plot_int_boxplot(glyexp::toy_experiment) plot_int_boxplot(glyexp::toy_experiment, by = "group")plot_int_boxplot(glyexp::toy_experiment) plot_int_boxplot(glyexp::toy_experiment, by = "group")
Draw a bar plot of missing value proportions for each sample or variable. Items are ordered from low to high missing proportion.
plot_missing_bar(exp, on = "sample")plot_missing_bar(exp, on = "sample")
exp |
A |
on |
Whether to plot missingness by |
A ggplot object of missing value proportions by item.
plot_missing_bar(glyexp::toy_experiment)plot_missing_bar(glyexp::toy_experiment)
Draw a binary heatmap showing the missing value pattern in an experiment. Values are binarized: 1 (present) or 0 (missing). Rows (variables) are sorted by missing value proportion from low to high. Columns (samples) are clustered using hierarchical clustering.
plot_missing_heatmap(exp, ...)plot_missing_heatmap(exp, ...)
exp |
A |
... |
Other arguments passed to |
A ggplot object of the missing value heatmap.
plot_missing_heatmap(glyexp::toy_experiment)plot_missing_heatmap(glyexp::toy_experiment)
Draw a scatter plot of proteins ranked by mean log2 intensity. Proteins are ordered from high to low mean intensity along the x-axis.
plot_rank_abundance(exp)plot_rank_abundance(exp)
exp |
A |
A ggplot object of protein rank abundance.
plot_rank_abundance(glyexp::toy_experiment)plot_rank_abundance(glyexp::toy_experiment)
Randomly draw replicate sample pairs and plot log2 intensity scatter plots. The plot title shows sample names, and the subtitle reports the R2 value.
plot_rep_scatter(exp, rep_col, n_pairs = 9)plot_rep_scatter(exp, rep_col, n_pairs = 9)
exp |
A |
rep_col |
Column name in |
n_pairs |
Number of replicate pairs to draw at random. |
A patchwork object containing replicate scatter plots.
exp <- glyexp::toy_experiment exp$sample_info$replicate <- rep(c("A", "B"), length.out = ncol(exp$expr_mat)) plot_rep_scatter(exp, rep_col = "replicate", n_pairs = 4)exp <- glyexp::toy_experiment exp$sample_info$replicate <- rep(c("A", "B"), length.out = ncol(exp$expr_mat)) plot_rep_scatter(exp, rep_col = "replicate", n_pairs = 4)
Draw boxplots of relative log expression (log2 intensity minus row median) for each sample. Optionally color and group samples by a metadata variable.
plot_rle(exp, by = NULL)plot_rle(exp, by = NULL)
exp |
A |
by |
Grouping variable for samples. Can be a column name in |
A ggplot object of RLE boxplots.
plot_rle(glyexp::toy_experiment) plot_rle(glyexp::toy_experiment, by = "group")plot_rle(glyexp::toy_experiment) plot_rle(glyexp::toy_experiment, by = "group")
Draw a bar plot showing total intensity (TIC) for each sample. Samples are ordered from high to low TIC from left to right.
plot_tic_bar(exp)plot_tic_bar(exp)
exp |
A |
A ggplot object of total intensity by sample.
plot_tic_bar(glyexp::toy_experiment)plot_tic_bar(glyexp::toy_experiment)
Constant variables are variables with the same value in all samples.
This function is equivalent to remove_low_var(x, var_cutoff = 0, by = by, strict = strict).
remove_constant(x, by = NULL, strict = FALSE)remove_constant(x, by = NULL, strict = FALSE)
x |
Either a |
by |
Either a column name in |
strict |
If |
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a filtered matrix.
Filters variables whose coefficient of variation falls below a threshold. Default behavior is to remove variables with zero coefficient of variation.
remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## Default S3 method: remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE)remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE) ## Default S3 method: remove_low_cv(x, cv_cutoff = 0, by = NULL, strict = FALSE)
x |
Either a |
cv_cutoff |
The cutoff for coefficient of variation. Defaults to 0. |
by |
A factor specifying the groupings. Defaults to NULL. |
strict |
If |
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a filtered matrix.
Filters variables based on median expression values. Variables with median expression values below certain percentile will be removed.
remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## Default S3 method: remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE)remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE) ## Default S3 method: remove_low_expr(x, percentile = 0.05, by = NULL, strict = FALSE)
x |
Either a |
percentile |
The percentile for median expression values. Defaults to 0.05, i.e., the 5% lowest median expression values will be removed. |
by |
Either a column name in |
strict |
If |
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a filtered matrix.
Filters variables whose variance falls below a threshold. Default behavior is to remove variables with zero variance.
remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## Default S3 method: remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE)remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'glyexp_experiment' remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## S3 method for class 'matrix' remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE) ## Default S3 method: remove_low_var(x, var_cutoff = 0, by = NULL, strict = FALSE)
x |
Either a |
var_cutoff |
The cutoff for variance. Defaults to 0. |
by |
A factor specifying the groupings. Defaults to NULL. |
strict |
If |
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a filtered matrix.
remove_low_cv(), remove_constant()
Remove Rare Variables with Too Many Missing Values
remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## S3 method for class 'glyexp_experiment' remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## S3 method for class 'matrix' remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## Default S3 method: remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL)remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## S3 method for class 'glyexp_experiment' remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## S3 method for class 'matrix' remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL) ## Default S3 method: remove_rare(x, prop = NULL, n = NULL, by = NULL, strict = FALSE, min_n = NULL)
x |
Either a |
prop |
The proportion of missing values to use as a threshold. Variables with missing values above this threshold will be removed. Defaults to 0.5. |
n |
The number of missing values to use as a threshold.
An alternative to |
by |
Either a column name in |
strict |
Works with |
min_n |
The minimum number of non-missing values required for a variable
to be kept. If
|
For glyexp_experiment input, returns a modified glyexp_experiment object.
For matrix input, returns a filtered matrix.
# With glyexp_experiment exp <- glyexp::toy_experiment exp$expr_mat[1, 1] <- NA # V1: 1/6 missing exp$expr_mat[2, 1:3] <- NA # V2: 3/6 missing exp$expr_mat[3, 1:5] <- NA # V3: 5/6 missing exp$expr_mat[4, 1:6] <- NA # V4: 6/6 missing exp$expr_mat # Remove variables with more than 50% missing values. remove_rare(exp, prop = 0.5)$expr_mat # Remove variables with more than 2 missing values. remove_rare(exp, n = 2)$expr_mat # Remove variables if they have more than 1 missing value in all groups. # In another word, keep variables as long as they have 1 or 0 missing value # in any group. remove_rare(exp, by = "group", strict = FALSE)$expr_mat # Keep only variables with no missing values. remove_rare(exp, prop = 0)$expr_mat # Use custom min_n to require at least 4 non-missing values remove_rare(exp, min_n = 4)$expr_mat # With matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3) mat_filtered <- remove_rare(mat, prop = 0.5)# With glyexp_experiment exp <- glyexp::toy_experiment exp$expr_mat[1, 1] <- NA # V1: 1/6 missing exp$expr_mat[2, 1:3] <- NA # V2: 3/6 missing exp$expr_mat[3, 1:5] <- NA # V3: 5/6 missing exp$expr_mat[4, 1:6] <- NA # V4: 6/6 missing exp$expr_mat # Remove variables with more than 50% missing values. remove_rare(exp, prop = 0.5)$expr_mat # Remove variables with more than 2 missing values. remove_rare(exp, n = 2)$expr_mat # Remove variables if they have more than 1 missing value in all groups. # In another word, keep variables as long as they have 1 or 0 missing value # in any group. remove_rare(exp, by = "group", strict = FALSE)$expr_mat # Keep only variables with no missing values. remove_rare(exp, prop = 0)$expr_mat # Use custom min_n to require at least 4 non-missing values remove_rare(exp, min_n = 4)$expr_mat # With matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3) mat_filtered <- remove_rare(mat, prop = 0.5)
This function implements a glycoWork-compatible ALR preprocessing strategy. Internally, the data are transformed relative to an automatically selected reference glycan using glycoWork-style ALR logic, then back-transformed to the original ratio space before returning the result.
transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'glyexp_experiment' transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'matrix' transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## Default S3 method: transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL)transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'glyexp_experiment' transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'matrix' transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## Default S3 method: transform_alr(x, by = NULL, gamma = 0.1, group_scales = NULL)
x |
Either a |
by |
Either a column name in |
gamma |
Standard deviation of the scale-uncertainty model on the |
group_scales |
Optional informed group scales. For binary comparisons, this can be a single positive ratio for the second group relative to the first, or two positive scales from which that ratio is derived. For multi-group data, provide a positive vector with one scale per group. |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with an ALR-transformed expression matrix.
If x is a matrix, returns an ALR-transformed matrix.
When ALR succeeds, the reference glycan is excluded from the result and the
output therefore has one fewer row than the input. When ALR falls back to
CLR, the returned object keeps the original dimensions. The returned values
are back-transformed to the original ratio space, corresponding to
x / x_ref.
The stochastic branch follows glycowork: when gamma > 0, CLR noise is
sampled per feature-sample entry rather than once per sample. Without
informed scales, this corresponds to jittering around the sample-specific
log2 geometric mean of the non-zero components and then returning to ratio
space.
The group_scales argument is interpreted in the same way as
glycowork's custom_scale. For exactly two groups, provide the total-signal
ratio of the second group relative to the first, either directly as a scalar
or implicitly through two per-group scales. For multi-group data, provide one
positive scale per group; these scales are used only in the stochastic branch.
If you intend to use ALR/CLR transformation with motif quantification, you should apply the transformation after motif quantification rather than before. In another words, the recommended workflow is:
Perform data preprocessing.
clean_exp <- auto_clean(exp)
Perform motif quantification on the cleaned experiment.
motif_exp <- glydet::quantify_motifs(clean_exp, motifs)
Perform ALR/CLR transformation on the motif-quantified experiment.
coda_motif_exp <- auto_coda(motif_exp, by = "group", gamma = 0.1)
Proceed with downstream analyses on the transformed motif experiment.
dea_res <- glystats::gly_ttest(coda_motif_exp)
This function implements a glycoWork-compatible CLR preprocessing strategy.
Internally, the data are transformed on the log2 scale following
glycowork::clr_transformation(), then back-transformed to the original
ratio space before returning the result.
transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'glyexp_experiment' transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'matrix' transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## Default S3 method: transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL)transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'glyexp_experiment' transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## S3 method for class 'matrix' transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL) ## Default S3 method: transform_clr(x, by = NULL, gamma = 0.1, group_scales = NULL)
x |
Either a |
by |
Either a column name in |
gamma |
Standard deviation of the scale-uncertainty model on the |
group_scales |
Optional informed group scales. For binary comparisons, this can be a single positive ratio for the second group relative to the first, or two positive scales from which that ratio is derived. For multi-group data, provide a positive vector with one scale per group. |
Returns the same type as the input. If x is a glyexp_experiment,
returns a glyexp_experiment with transformed expression matrix.
If x is a matrix, returns a transformed matrix.
The returned values are back-transformed to the original ratio space.
Zeros in the input therefore remain zeros in the output.
The stochastic branch follows glycowork: when gamma > 0, CLR noise is
sampled per feature-sample entry rather than once per sample. Without
informed scales, this corresponds to jittering around the sample-specific
log2 geometric mean of the non-zero components and then returning to ratio
space.
The group_scales argument is interpreted in the same way as
glycowork's custom_scale. For exactly two groups, provide the total-signal
ratio of the second group relative to the first, either directly as a scalar
or implicitly through two per-group scales. For multi-group data, provide one
positive scale per group; these scales are used only in the stochastic branch.
If you intend to use ALR/CLR transformation with motif quantification, you should apply the transformation after motif quantification rather than before. In another words, the recommended workflow is:
Perform data preprocessing.
clean_exp <- auto_clean(exp)
Perform motif quantification on the cleaned experiment.
motif_exp <- glydet::quantify_motifs(clean_exp, motifs)
Perform ALR/CLR transformation on the motif-quantified experiment.
coda_motif_exp <- auto_coda(motif_exp, by = "group", gamma = 0.1)
Proceed with downstream analyses on the transformed motif experiment.
dea_res <- glystats::gly_ttest(coda_motif_exp)