| Title: | Statistical Analysis of Glycoproteomics and Glycomics Data |
|---|---|
| Description: | Provides a unified toolbox for bioinformatics analyses of glycomics and glycoproteomics data. Implemented methods include differential testing (t-test, Wilcoxon rank-sum test, ANOVA, Kruskal–Wallis), multivariate modeling and visualization (PCA, t-SNE, UMAP), supervised learning (PLS-DA, OPLS-DA), clustering (k-means, hierarchical, consensus clustering), network analysis (weighted gene co-expression network analysis, WGCNA), survival modeling (Cox proportional hazards), enrichment analysis, correlation analysis, and ROC/AUC evaluation. All user-facing functions follow the gly_*() naming convention to facilitate auto-completion in RStudio. The package is designed to work seamlessly with 'glyexp' and tidy data workflows. |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.10.1 |
| Built: | 2026-05-22 10:27:12 UTC |
| Source: | https://github.com/glycoverse/glystats |
Filtering the experiment to keep only significant variables is a common task.
This function provides a convenient way to do this.
It supports results from all glystats DEA functions including
gly_anova(), gly_ancova(), gly_kruskal(), gly_ttest(), gly_wilcox(), and gly_limma().
filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_anova_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_ancova_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_kruskal_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_ttest_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_wilcox_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_limma_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, comparison = NULL, ... )filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_anova_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_ancova_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_kruskal_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, on = "main_test", comparison = NULL, ... ) ## S3 method for class 'glystats_ttest_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_wilcox_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, ... ) ## S3 method for class 'glystats_limma_res' filter_sig_vars( exp, res, p_adj_cutoff = 0.05, p_val_cutoff = NULL, fc_cutoff = NULL, comparison = NULL, ... )
exp |
An |
res |
A glystats result object from a glystats DEA function. |
p_adj_cutoff |
The threshold for p-adjusted values. Default is 0.05. |
p_val_cutoff |
The threshold for p-values. We don't recommend using this. Default is NULL.
If you insist to use it, please set |
fc_cutoff |
The threshold for fold changes. Only positive value is needed.
For example, |
... |
Additional arguments passed to methods. See the method-specific documentation for details. |
on |
(For |
comparison |
(For |
An new glyexp::experiment() object.
library(glyexp) library(glyclean) exp <- auto_clean(real_experiment) |> glyexp::slice_head_var(n = 10) res <- gly_anova(exp) sig_exp <- filter_sig_vars(exp, res)library(glyexp) library(glyclean) exp <- auto_clean(real_experiment) |> glyexp::slice_head_var(n = 10) res <- gly_anova(exp) sig_exp <- filter_sig_vars(exp, res)
Syntax sugar for $tidy_result and $raw_result elements of a glystats result object.
It's useful to be used in pipes.
get_tidy_result(res, which = NULL) get_raw_result(res, which = NULL)get_tidy_result(res, which = NULL) get_raw_result(res, which = NULL)
res |
A glystats result object. |
which |
Used to specify which element to get, when the result is a list.
For glystats results with only one tidy result (e.g. |
A tibble.
library(glyexp) library(glyclean) library(dplyr) exp <- auto_clean(real_experiment) |> glyexp::slice_head_var(n = 10) # Using a pipe sig_res <- exp |> gly_anova() |> get_tidy_result("main_test") |> filter(p_adj < 0.05) # Equivalent to anova_res <- gly_anova(exp) sig_res <- anova_res$tidy_result$main_test |> filter(p_adj < 0.05)library(glyexp) library(glyclean) library(dplyr) exp <- auto_clean(real_experiment) |> glyexp::slice_head_var(n = 10) # Using a pipe sig_res <- exp |> gly_anova() |> get_tidy_result("main_test") |> filter(p_adj < 0.05) # Equivalent to anova_res <- gly_anova(exp) sig_res <- anova_res$tidy_result$main_test |> filter(p_adj < 0.05)
Perform ANCOVA for glycomics or glycoproteomics data with sample-level covariates.
The function supports parametric comparison of multiple groups while adjusting for covariates.
For significant results, emmeans post-hoc comparisons (Tukey adjustment) are automatically performed.
P-values are adjusted for multiple testing using the method specified by p_adj_method.
gly_ancova( exp, group_col = "group", covariate_cols = NULL, p_adj_method = "BH", add_info = TRUE, ... ) gly_ancova_(expr_mat, groups, covariates, p_adj_method = "BH", ...)gly_ancova( exp, group_col = "group", covariate_cols = NULL, p_adj_method = "BH", add_info = TRUE, ... ) gly_ancova_(expr_mat, groups, covariates, p_adj_method = "BH", ...)
exp |
A |
group_col |
(Only for |
covariate_cols |
(Only for |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
covariates |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. At least 2 groups and at least 1 covariate are required.
gly_ancova() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_ancova_() is the underlying API that works with matrices, factor vectors, and covariate
data directly, providing more flexibility for users who don't use the glyexp package.
For any variable failed to fit a stats::aov() model,
NAs will be assigned to the results in both main test and post-hoc test.
Post-hoc Test:
emmeans pairwise comparisons with Tukey adjustment (emmeans::emmeans()) are performed
for variables with significant main effects (p_adj < 0.05).
A list containing two elements:
tidy_result: A list containing:
main_test: A tibble with ANCOVA results containing the following columns:
variable: Variable name
term: ANCOVA term (usually "group")
df: Degrees of freedom
sumsq: Sum of squares
meansq: Mean squares
statistic: F-statistic
p_val: Raw p-value from ANCOVA
p_adj: Adjusted p-value (if p_adj_method is not NULL)
post_hoc: Significant group pairs from post-hoc test, in the format of "ref_vs_test".
post_hoc_test: A tibble with pairwise comparison results containing the following columns:
variable: Variable name
ref_group: Reference group
test_group: Test/treatment/case group
p_val: Adjusted p-value from emmeans pairwise test
p_adj: Adjusted p-value from emmeans pairwise test
raw_result: A list containing:
main_test: A list of raw aov model objects.
post_hoc_test: A list of raw emmeans results.
This function requires the emmeans package for post-hoc comparisons.
stats::aov(), emmeans::emmeans(), emmeans::contrast()
Perform one-way ANOVA for glycomics or glycoproteomics data.
The function supports parametric comparison of multiple groups.
For significant results, Tukey's HSD post-hoc test is automatically performed.
P-values are adjusted for multiple testing using the method specified by p_adj_method.
gly_anova(exp, group_col = "group", p_adj_method = "BH", add_info = TRUE, ...) gly_anova_(expr_mat, groups, p_adj_method = "BH", ...)gly_anova(exp, group_col = "group", p_adj_method = "BH", add_info = TRUE, ...) gly_anova_(expr_mat, groups, p_adj_method = "BH", ...)
exp |
A |
group_col |
(Only for |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. At least 2 groups are required in the grouping variable.
gly_anova() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_anova_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
For any variable failed to fit a stats::aov() model,
NAs will be assigned to the results in both main test and post-hoc test.
Post-hoc Test:
Tukey's HSD test for pairwise comparisons (stats::TukeyHSD()) is performed
for variables with significant main effects (p_adj < 0.05).
A list containing three elements:
tidy_result: A list containing:
main_test: A tibble with ANOVA results containing the following columns:
variable: Variable name
term: ANOVA term (usually "groups")
df: Degrees of freedom
sumsq: Sum of squares
meansq: Mean squares
statistic: F-statistic
p_val: Raw p-value from ANOVA
effect_size: Eta-squared
p_adj: Adjusted p-value (if p_adj_method is not NULL)
post_hoc: Significant group pairs from post-hoc test, in the format of "ref_vs_test".
post_hoc_test: A tibble with pairwise comparison results containing the following columns:
variable: Variable name
ref_group: Reference group
test_group: Test/treatment/case group
p_val: Raw p-value from Tukey's HSD test
p_adj: Adjusted p-value from Tukey's HSD test
raw_result: A list containing:
main_test: A list of raw aov model objects.
post_hoc_test: A list of raw TukeyHSD objects. Post-hoc
comparison labels follow the package direction, i.e.
test_group - ref_group.
meta_data (only for gly_anova()): A list containing metadata from the input experiment.
stats::aov(), stats::TukeyHSD()
Perform pairwise correlation analysis on variables or samples in the expression data. The function calculates correlation coefficients and p-values for all pairs, with optional multiple testing correction.
gly_cor(exp, on = "variable", method = "pearson", p_adj_method = "BH", ...) gly_cor_( expr_mat, on = "variable", method = "pearson", p_adj_method = "BH", ... )gly_cor(exp, on = "variable", method = "pearson", p_adj_method = "BH", ...) gly_cor_( expr_mat, on = "variable", method = "pearson", p_adj_method = "BH", ... )
exp |
A |
on |
A character string specifying what to correlate. Either "variable" (default) to correlate variables/features, or "sample" to correlate samples/observations. |
method |
A character string indicating which correlation coefficient is to be computed. One of "pearson" (default) or "spearman". Note: "kendall" is not supported by Hmisc::rcorr. |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
... |
Additional arguments passed to |
expr_mat |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before
correlation analysis. When on = "variable" (default), correlations are calculated between
variables across samples. When on = "sample", correlations are calculated between
samples across variables.
gly_cor() is the top-level API that works with glyexp::experiment() objects。
gly_cor_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
Correlation Calculation:
Correlation coefficients and p-values are calculated using Hmisc::rcorr() which is
more efficient than pairwise stats::cor.test() calls.
Multiple Testing Correction:
P-values are adjusted for multiple testing using the method specified by p_adj_method.
A list with three elements:
tidy_result: A tibble with correlation results containing the following columns:
variable1 or sample1: First element of the pair (depending on on parameter)
variable2 or sample2: Second element of the pair (depending on on parameter)
cor: Correlation coefficient
p_val: Raw p-value from correlation test
p_adj: Adjusted p-value (if p_adj_method is not NULL)
raw_result: The raw rcorr object from Hmisc::rcorr()
meta_data (only for gly_cor()): A list containing metadata from the input experiment
The list has classes glystats_cor_res and glystats_res.
This function requires the Hmisc package for efficient correlation calculation.
Fit a Cox proportional hazards model for each variable in the expression data, and extract p-values and hazard ratios from it.
gly_cox( exp, time_col = "time", event_col = "event", p_adj_method = "BH", add_info = TRUE, ... ) gly_cox_(expr_mat, time, event, p_adj_method = "BH", ...)gly_cox( exp, time_col = "time", event_col = "event", p_adj_method = "BH", add_info = TRUE, ... ) gly_cox_(expr_mat, time, event, p_adj_method = "BH", ...)
exp |
A |
time_col |
A character string specifying the column name in sample information that contains survival time. Default is "time". |
event_col |
A character string specifying the column name in sample information that contains event indicator (1 for event, 0 for censoring). Default is "event". |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the Cox model results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
time |
A numeric vector specifying the survival time for each sample. |
event |
A numeric vector specifying the event indicator for each sample (1 for event, 0 for censoring). |
The function fits an Cox proportional hazards model for each variable by:
coxph(Surv(time, event) ~ expression_value)
P-values are adjusted by Benjamini-Hochberg method by default.
A list with three elements:
tidy_result: A tibble with Cox model results containing the following columns:
variable: Variable name
coefficient: Regression coefficient (log hazard ratio)
std.error: Standard error of the coefficient
statistic: Wald test statistic
p_val: Raw p-value from Wald test
hr: Hazard ratio (exp(coefficient))
p_adj: Adjusted p-value (if p_adj_method is not NULL)
raw_result: A list of raw coxph model objects.
meta_data (only for gly_cox()): A list containing metadata from the input experiment
survival::coxph(), survival::Surv()
Calculate fold change for a given expression matrix and group information. It could only be used for 2-group analysis. When you run this function, you will see message about "Ref Group" and "Test Group". "Ref Group" is the reference group, and "Test Group" is the test/treatment/case group.
gly_fold_change(exp, group_col = "group", add_info = TRUE) gly_fold_change_(expr_mat, groups)gly_fold_change(exp, group_col = "group", add_info = TRUE) gly_fold_change_(expr_mat, groups)
exp |
A |
group_col |
(Only for |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the fold change results are returned.
Only applicable to |
expr_mat |
(Only for |
groups |
(Only for |
gly_fold_change() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_fold_change_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
A list with three elements:
tidy_result: A tibble with fold change results containing the following columns:
variable: Variable name
log2fc: Log2 fold change (log2(group2_mean / group1_mean))
If more than two groups, two additional columns ref_group and test_group will be added.
raw_result: The raw result (same as tidy_result for this function)
meta_data (only for gly_fold_change()): A list containing metadata from the input experiment
Perform hierarchical clustering on the expression data.
The function uses stats::hclust() to perform clustering and provides
tidy results including cluster assignments, dendrogram data for plotting,
and merge heights.
gly_hclust( exp, on = "variable", k_values = c(2, 3, 4, 5), scale = TRUE, add_info = TRUE, ... ) gly_hclust_( expr_mat, on = "variable", k_values = c(2, 3, 4, 5), scale = TRUE, ... )gly_hclust( exp, on = "variable", k_values = c(2, 3, 4, 5), scale = TRUE, add_info = TRUE, ... ) gly_hclust_( expr_mat, on = "variable", k_values = c(2, 3, 4, 5), scale = TRUE, ... )
exp |
A |
on |
A character string specifying what to cluster. Either "variable" (default) to cluster variables/features, or "sample" to cluster samples/observations. |
k_values |
A numeric vector specifying the number of clusters to cut the tree into. Default is c(2, 3, 4, 5). If NULL, no cluster assignments are returned. |
scale |
A logical indicating whether to scale the data before clustering. Default is TRUE. |
add_info |
A logical value. If TRUE (default), sample information from the experiment
will be added to the result tibbles. If FALSE, only the clustering results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before
clustering. When on = "variable" (default), variables are clustered based on their
expression patterns across samples. When on = "sample", samples are clustered based
on their expression profiles across variables.
gly_hclust() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_hclust_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
Distance Calculation:
Distance is calculated using stats::dist() with the specified method.
Clustering Method:
Hierarchical clustering is performed using stats::hclust() with the specified method.
Cluster Assignment:
The dendrogram is cut at different heights to produce cluster assignments for
the specified k values using stats::cutree().
A list containing:
tidy_result: A list of tibbles with clustering results:
clusters: Cluster assignments containing the following columns:
variable or sample: Variable or sample name (depending on on parameter)
cluster_k2, cluster_k3, etc.: Cluster assignments for different k values
dendrogram: Dendrogram segment data for plotting (if ggdendro is available) containing:
x, y, xend, yend: Segment coordinates for plotting
heights: Merge heights and steps containing the following columns:
merge_step: Step number in the clustering process
height: Height at which clusters are merged
n_clusters: Number of clusters remaining after this merge
labels: Labels and their positions (if ggdendro is available) containing:
x, y: Position coordinates
label: Label text
raw_result: The raw hclust object from stats::hclust()
meta_data (only for gly_hclust()): A list containing metadata from the input experiment
This function only uses base R packages and does not require additional dependencies.
For enhanced dendrogram plotting capabilities, the ggdendro package is recommended
but not required.
stats::hclust(), stats::dist(), stats::cutree()
Perform k-means clustering on the expression data.
The function uses stats::kmeans() to perform clustering and provides
tidy results with cluster assignments.
gly_kmeans( exp, on = "variable", centers = 3, scale = TRUE, add_info = TRUE, ... ) gly_kmeans_(expr_mat, on = "variable", centers = 3, scale = TRUE, ...)gly_kmeans( exp, on = "variable", centers = 3, scale = TRUE, add_info = TRUE, ... ) gly_kmeans_(expr_mat, on = "variable", centers = 3, scale = TRUE, ...)
exp |
A |
on |
A character string specifying what to cluster. Either "variable" (default) to cluster variables/features, or "sample" to cluster samples/observations. |
centers |
Either the number of clusters (integer) or a set of initial cluster centers. Default is 3. |
scale |
A logical indicating whether to scale the data before clustering. Default is TRUE. |
add_info |
A logical value. If TRUE (default), sample information from the experiment
will be added to the result tibbles. If FALSE, only the clustering results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before
clustering. When on = "variable" (default), variables are clustered based on their
expression patterns across samples. When on = "sample", samples are clustered based
on their expression profiles across variables.
gly_kmeans() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_kmeans_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
Data Preparation: Data is log2-transformed and optionally scaled before clustering.
Clustering Method:
K-means clustering is performed using stats::kmeans() with the specified parameters.
A list with three elements:
tidy_result: A tibble with cluster assignments containing the following columns:
variable or sample: Variable or sample name (depending on on parameter)
cluster: Cluster assignment
raw_result: The raw kmeans object from stats::kmeans().
meta_data (only for gly_kmeans()): A list containing metadata from the input experiment
This function only uses base R packages and does not require additional dependencies.
Perform Kruskal-Wallis test for glycomics or glycoproteomics data.
The function supports non-parametric comparison of multiple groups.
For significant results, Dunn's post-hoc test is automatically performed.
P-values are adjusted for multiple testing using the method specified by p_adj_method.
gly_kruskal( exp, group_col = "group", p_adj_method = "BH", add_info = TRUE, ... ) gly_kruskal_(expr_mat, groups, p_adj_method = "BH", ...)gly_kruskal( exp, group_col = "group", p_adj_method = "BH", add_info = TRUE, ... ) gly_kruskal_(expr_mat, groups, p_adj_method = "BH", ...)
exp |
A |
group_col |
(Only for |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. At least 2 groups are required in the grouping variable.
gly_kruskal() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_kruskal_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
For any variable failed to fit a stats::kruskal.test() model,
NAs will be assigned to the results in both main test and post-hoc test.
Post-hoc Test:
Dunn's test with Holm correction for multiple comparisons (FSA::dunnTest()) is performed
for variables with significant main effects (p_adj < 0.05).
A list containing three elements:
tidy_result: A list containing:
main_test: A tibble with Kruskal-Wallis test results containing the following columns:
variable: Variable name
statistic: Kruskal-Wallis test statistic
p_val: Raw p-value from Kruskal-Wallis test
parameter: Degrees of freedom
method: Statistical method used
effect_size: Epsilon-squared
p_adj: Adjusted p-value (if p_adj_method is not NULL)
post_hoc: Significant group pairs from post-hoc test, in the format of "ref_vs_test".
post_hoc_test: A tibble with pairwise comparison results containing the following columns:
variable: Variable name
ref_group: Reference group
test_group: Test/treatment/case group
p_val: Raw p-value from Dunn's test
p_adj: Adjusted p-value from Dunn's test
raw_result: A list containing:
main_test: A list of raw kruskal.test objects.
post_hoc_test: A list of raw dunnTest objects. Post-hoc
comparison labels and direction-sensitive statistics follow the package
direction, i.e. test_group - ref_group.
meta_data: A list containing metadata from the input experiment.
This function requires the FSA package for Dunn's post-hoc test.
stats::kruskal.test(), FSA::dunnTest()
Perform differential expression analysis using linear models with empirical Bayes moderation from the limma package. Supports both two-group and multi-group comparisons.
gly_limma( exp, group_col = "group", covariate_cols = NULL, subject_col = NULL, p_adj_method = "BH", ref_group = NULL, contrasts = NULL, add_info = TRUE, ... ) gly_limma_( expr_mat, groups, p_adj_method = "BH", ref_group = NULL, contrasts = NULL, covariates = NULL, subjects = NULL, ... )gly_limma( exp, group_col = "group", covariate_cols = NULL, subject_col = NULL, p_adj_method = "BH", ref_group = NULL, contrasts = NULL, add_info = TRUE, ... ) gly_limma_( expr_mat, groups, p_adj_method = "BH", ref_group = NULL, contrasts = NULL, covariates = NULL, subjects = NULL, ... )
exp |
A |
group_col |
(Only for |
covariate_cols |
(Only for |
subject_col |
(Only for |
p_adj_method |
A character string specifying the method for multiple testing correction.
Must be one of the methods supported by |
ref_group |
A character string specifying the reference group. If NULL (default), the first level of the group factor is used as the reference. Only used for two-group comparisons. |
contrasts |
A character vector specifying custom contrasts. If NULL (default), all pairwise comparisons are automatically generated, and the levels coming first in the factor will be used as the reference group. Supports two formats: "group1-group2" or "group1_vs_group2". Use the second format if group names contain hyphens. "group1" will be used as the reference group. |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
covariates |
(Only for |
subjects |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. The analysis uses linear models with empirical Bayes moderation to improve statistical power, especially for small sample sizes.
gly_limma() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_limma_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
For two or more groups, all pairwise comparisons are automatically generated (one contrast for two groups) and performed using contrast matrices unless custom contrasts are specified.
If covariates are provided, they are added as additional terms in the design matrix and are not part of the group contrasts.
If subjects are provided, they are included as a blocking factor in the design
matrix using the formula ~ 0 + group + subject (plus any covariates).
When specifying custom contrasts, use either "group1-group2" or "group1_vs_group2" format. If group names contain hyphens and you use the first format, an error will be raised suggesting to use the second format.
A list with three elements:
tidy_result: A tibble with limma results containing the following columns:
variable: Variable name
log2fc: Log2 fold change
AveExpr: Average expression level
t: t-statistic
p_val: Raw p-value
p_adj: Adjusted p-value (if p_adj_method is not NULL)
b: B-statistic (log-odds of differential expression)
ref_group: Reference group
test_group: Test group
raw_result: The raw limma fit object(s).
meta_data (only for gly_limma()): A list containing metadata from the input experiment.
This function requires the following packages to be installed:
limma for linear model fitting and empirical Bayes moderation
limma::lmFit(), limma::eBayes(), limma::makeContrasts()
Perform orthogonal partial least squares discriminant analysis on the expression data.
The function uses ropls::opls() to perform OPLS-DA and returns tidy results.
gly_oplsda( exp, group_col = "group", pred_i = 1, ortho_i = NA, scale = TRUE, add_info = TRUE, ... ) gly_oplsda_(expr_mat, groups, pred_i = 1, ortho_i = NA, scale = TRUE, ...)gly_oplsda( exp, group_col = "group", pred_i = 1, ortho_i = NA, scale = TRUE, add_info = TRUE, ... ) gly_oplsda_(expr_mat, groups, pred_i = 1, ortho_i = NA, scale = TRUE, ...)
exp |
A |
group_col |
(Only for |
pred_i |
An integer indicating the number of predictive components to include. Default is 1. |
ortho_i |
An integer indicating the number of orthogonal components to include. Default is NA (automatic). |
scale |
A logical indicating whether to scale the data. Default is TRUE. |
add_info |
A logical value. If TRUE (default), sample and variable information from the experiment
will be added to the result tibbles. If FALSE, only the OPLS-DA results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
gly_oplsda() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_oplsda_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
A list containing:
tidy_result: A list of tibbles with OPLS-DA results:
samples: OPLS-DA scores for each sample containing the following columns:
sample: Sample name
group: Group assignment
p1, p2, etc.: Predictive component scores
o1, o2, etc.: Orthogonal component scores
variables: OPLS-DA loadings for each variable containing the following columns:
variable: Variable name
p1, p2, etc.: Predictive component loadings
o1, o2, etc.: Orthogonal component loadings
pcorr1, pcorr2, etc.: Correlation between each variable and the corresponding predictive component
variance: OPLS-DA explained variance containing the following columns:
component: Component name (p1, o1, etc.)
prop_var_explained: Proportion of variance explained by each component
cumulative_prop_var: Cumulative proportion of variance explained
vip: Variable Importance in Projection scores containing the following columns:
variable: Variable name
vip: VIP score
perm_test: Permutation test results containing the following columns:
model: Model type ("Original" for the original model, "Permutation" for permuted models)
perm_id: Permutation ID (0 for original model, 1+ for permutations)
Additional columns from the permutation test matrix (e.g., R2X, R2Y, Q2, etc.)
raw_result: The raw ropls opls object from ropls::opls()
meta_data (only for gly_oplsda()): A list containing metadata from the input experiment
This function requires the following packages to be installed:
ropls for OPLS-DA analysis
Perform principal component analysis on the expression data.
The function uses prcomp() to perform PCA and broom::tidy() to tidy the results.
If scale = TRUE, constant variables (zero variance) will be removed before PCA.
gly_pca(exp, center = TRUE, scale = TRUE, add_info = TRUE, ...) gly_pca_(expr_mat, center = TRUE, scale = TRUE, ...)gly_pca(exp, center = TRUE, scale = TRUE, add_info = TRUE, ...) gly_pca_(expr_mat, center = TRUE, scale = TRUE, ...)
exp |
A |
center |
A logical indicating whether to center the data. Default is TRUE. |
scale |
A logical indicating whether to scale the data. Default is TRUE. |
add_info |
A logical value. If TRUE (default), sample and variable information from the experiment
will be added to the result tibbles. If FALSE, only the PCA results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
The function performs log transformation on the expression data (log(x + 1)) before PCA analysis.
gly_pca() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_pca_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
A list containing:
tidy_result: A list of tibbles with PCA results:
samples: PCA scores for each sample containing the following columns:
sample: Sample name
PC: Principal component name (PC1, PC2, etc.)
value: Score value for the principal component
variables: PCA loadings for each variable containing the following columns:
variable: Variable name
PC: Principal component name (PC1, PC2, etc.)
value: Loading value for the principal component
eigenvalues: PCA eigenvalues containing the following columns:
PC: Principal component name (PC1, PC2, etc.)
std.dev: Standard deviation
percent: Percentage of variance explained
cumulative: Cumulative percentage of variance explained
raw_result: The raw prcomp object from stats::prcomp()
meta_data (only for gly_pca()): A list containing metadata from the input experiment
This function only uses base R packages and does not require additional dependencies.
Perform partial least squares discriminant analysis on the expression data.
The function uses ropls::opls() to perform PLS-DA and returns tidy results.
gly_plsda( exp, group_col = "group", ncomp = 2, scale = TRUE, add_info = TRUE, ... ) gly_plsda_(expr_mat, groups, ncomp = 2, scale = TRUE, ...)gly_plsda( exp, group_col = "group", ncomp = 2, scale = TRUE, add_info = TRUE, ... ) gly_plsda_(expr_mat, groups, ncomp = 2, scale = TRUE, ...)
exp |
A |
group_col |
(Only for |
ncomp |
An integer indicating the number of components to include. Default is 2. |
scale |
A logical indicating whether to scale the data. Default is TRUE. |
add_info |
A logical value. If TRUE (default), sample and variable information from the experiment
will be added to the result tibbles. If FALSE, only the PLS-DA results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
gly_plsda() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_plsda_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
A list containing:
tidy_result: A list of tibbles with PLS-DA results:
samples: PLS-DA scores for each sample containing the following columns:
sample: Sample name
group: Group assignment
p1, p2, etc.: PLS-DA component scores
variables: PLS-DA loadings for each variable containing the following columns:
variable: Variable name
p1, p2, etc.: PLS-DA component loadings
pcorr1, pcorr2, etc.: Correlation between each variable and component
variance: PLS-DA explained variance containing the following columns:
component: Component name (p1, p2, etc.)
prop_var_explained: Proportion of variance explained by each component
cumulative_prop_var: Cumulative proportion of variance explained
vip: Variable Importance in Projection scores containing the following columns:
variable: Variable name
vip: VIP score
perm_test: Permutation test results containing the following columns:
model: Model type ("Original" for the original model, "Permutation" for permuted models)
perm_id: Permutation ID (0 for original model, 1+ for permutations)
Additional columns from the permutation test matrix (e.g., R2X, R2Y, Q2, etc.)
raw_result: The raw ropls opls object from ropls::opls()
meta_data (only for gly_plsda()): A list containing metadata from the input experiment
This function requires the following packages to be installed:
ropls for PLS-DA analysis
Perform Receiver Operating Characteristic (ROC) analysis for binary classification of glycomics or glycoproteomics data. The function calculates ROC curves and Area Under the Curve (AUC) values for each variable to assess their discriminatory power between two groups.
gly_roc(exp, group_col = "group", pos_class = NULL, add_info = TRUE) gly_roc_(expr_mat, groups, pos_class = NULL)gly_roc(exp, group_col = "group", pos_class = NULL, add_info = TRUE) gly_roc_(expr_mat, groups, pos_class = NULL)
exp |
A |
group_col |
(Only for |
pos_class |
A character string specifying which group level should be treated as
the positive class. If |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibbles. If FALSE, only the ROC analysis results are returned.
Only applicable to |
expr_mat |
(Only for |
groups |
(Only for |
For each variable, a ROC curve is computed using the expression values as predictor and the binary group labels as response.
The function requires exactly 2 groups in the specified grouping variable. If more than 2 groups are present, an error will be thrown.
gly_roc() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_roc_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
Underlying Function:
ROC analysis is performed using pROC::roc()
Coordinates are extracted using pROC::coords()
A list with three elements:
tidy_result: A list containing two tibbles:
auc: A tibble containing AUC values for each variable with the following columns:
variable: Variable name
auc: Area Under the Curve value
auc_ci_low: Lower bound of 95% confidence interval for AUC
auc_ci_high: Upper bound of 95% confidence interval for AUC
coords: A tibble containing ROC curve coordinates with the following columns:
variable: Variable name
threshold: Threshold value for classification
specificity: Specificity (True Negative Rate)
sensitivity: Sensitivity (True Positive Rate)
raw_result: A list of pROC objects
meta_data (only for gly_roc()): A list containing metadata from the input experiment
The list has classes glystats_roc_res and glystats_res.
This function requires the pROC package to be installed for ROC curve computation.
Perform t-SNE dimensionality reduction on the expression data.
The function uses Rtsne::Rtsne() to perform t-SNE analysis.
gly_tsne(exp, dims = 2, perplexity = 30, add_info = TRUE, ...) gly_tsne_(expr_mat, dims = 2, perplexity = 30, ...)gly_tsne(exp, dims = 2, perplexity = 30, add_info = TRUE, ...) gly_tsne_(expr_mat, dims = 2, perplexity = 30, ...)
exp |
A |
dims |
Number of output dimensions. Default is 2. |
perplexity |
Perplexity parameter for t-SNE. Default is 30. |
add_info |
A logical value. If TRUE (default), sample information from the experiment
will be added to the result tibble. If FALSE, only the t-SNE coordinates are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
gly_tsne() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_tsne_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
A list with three elements:
tidy_result: A tibble with t-SNE coordinates containing the following columns:
sample: Sample name
tsne1: First t-SNE dimension
tsne2: Second t-SNE dimension
raw_result: The raw Rtsne object
meta_data (only for gly_tsne()): A list containing metadata from the input experiment
The list has classes glystats_tsne_res and glystats_res.
This function requires the Rtsne package to be installed for t-SNE analysis.
Perform two-sample t-test for glycomics or glycoproteomics data.
The function supports Student's t-test for comparing two groups.
P-values are adjusted for multiple testing using the method specified by p_adj_method.
gly_ttest( exp, group_col = "group", p_adj_method = "BH", ref_group = NULL, add_info = TRUE, ... ) gly_ttest_(expr_mat, groups, p_adj_method = "BH", ref_group = NULL, ...)gly_ttest( exp, group_col = "group", p_adj_method = "BH", ref_group = NULL, add_info = TRUE, ... ) gly_ttest_(expr_mat, groups, p_adj_method = "BH", ref_group = NULL, ...)
exp |
A |
group_col |
(Only for |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
ref_group |
A character string specifying the reference group. If NULL (default), the first level of the group factor is used as the reference. |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. Exactly 2 groups are required in the grouping variable.
gly_ttest() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_ttest_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
A list with three elements:
tidy_result: A tibble with t-test results containing the following columns:
variable: Variable name
estimate: Difference in group means (group2 - group1)
estimate1: Mean of group 1
estimate2: Mean of group 2
statistic: t-statistic
p_val: Raw p-value from t-test
parameter: Degrees of freedom
conf_low: Lower bound of 95% confidence interval
conf_high: Upper bound of 95% confidence interval
effect_size: Cohen's d
method: Statistical method used
alternative: Alternative hypothesis
p_adj: Adjusted p-value (if p_adj_method is not NULL)
log2fc: Log2 fold change (log2(group2_mean / group1_mean))
raw_result: A list of t.test model objects
meta_data (only for gly_ttest()): A list containing metadata from the input experiment
The list has classes glystats_ttest_res and glystats_res.
Perform UMAP dimensionality reduction on the expression data.
The function uses uwot::umap() to perform UMAP analysis.
gly_umap(exp, n_neighbors = 15, n_components = 2, add_info = TRUE, ...) gly_umap_(expr_mat, n_neighbors = 15, n_components = 2, ...)gly_umap(exp, n_neighbors = 15, n_components = 2, add_info = TRUE, ...) gly_umap_(expr_mat, n_neighbors = 15, n_components = 2, ...)
exp |
A |
n_neighbors |
Number of neighbors to consider for each point. Default is 15. |
n_components |
Number of output dimensions. Default is 2. |
add_info |
A logical value. If TRUE (default), sample information from the experiment
will be added to the result tibble. If FALSE, only the UMAP coordinates are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
gly_umap() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_umap_() is the underlying API that works with matrices directly,
providing more flexibility for users who don't use the glyexp package.
A list with three elements:
tidy_result: A tibble with UMAP coordinates containing the following columns:
sample: Sample name
umap1: First UMAP dimension
umap2: Second UMAP dimension
umap3, umap4, etc.: Additional UMAP dimensions (if n_components > 2)
raw_result: The raw UMAP result matrix
meta_data (only for gly_umap()): A list containing metadata from the input experiment
The list has classes glystats_umap_res and glystats_res.
This function requires the uwot package to be installed for UMAP analysis.
Perform Wilcoxon rank-sum test (Mann-Whitney U test) for glycomics or glycoproteomics data.
The function supports non-parametric comparison of two groups.
P-values are adjusted for multiple testing using the method specified by p_adj_method.
gly_wilcox( exp, group_col = "group", p_adj_method = "BH", ref_group = NULL, add_info = TRUE, ... ) gly_wilcox_(expr_mat, groups, p_adj_method = "BH", ref_group = NULL, ...)gly_wilcox( exp, group_col = "group", p_adj_method = "BH", ref_group = NULL, add_info = TRUE, ... ) gly_wilcox_(expr_mat, groups, p_adj_method = "BH", ref_group = NULL, ...)
exp |
A |
group_col |
(Only for |
p_adj_method |
A character string specifying the method to adjust p-values.
See |
ref_group |
A character string specifying the reference group. If NULL (default), the first level of the group factor is used as the reference. |
add_info |
A logical value. If TRUE (default), variable information from the experiment
will be added to the result tibble. If FALSE, only the statistical results are returned.
Only applicable to |
... |
Additional arguments passed to |
expr_mat |
(Only for |
groups |
(Only for |
The function performs log2 transformation on the expression data (log2(x + 1e-6)) before statistical testing. Exactly 2 groups are required in the grouping variable.
gly_wilcox() is the top-level API that works with glyexp::experiment() objects and supports
the add_info parameter for joining experiment metadata.
gly_wilcox_() is the underlying API that works with matrices and factor vectors directly,
providing more flexibility for users who don't use the glyexp package.
A list with three elements:
tidy_result: A tibble with Wilcoxon test results containing the following columns:
variable: Variable name
statistic: Wilcoxon test statistic
p_val: Raw p-value from Wilcoxon test
effect_size: Rank-biserial correlation
method: Statistical method used
alternative: Alternative hypothesis
p_adj: Adjusted p-value (if p_adj_method is not NULL)
log2fc: Log2 fold change (log2(group2_mean / group1_mean))
Additional columns from experiment metadata may be included if add_info = TRUE.
raw_result: A list of wilcox.test model objects
meta_data (only for gly_wilcox()): A list containing metadata from the input experiment
The list has classes glystats_wilcox_res and glystats_res.