--- title: "Getting Started with glystats" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with glystats} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` When we talk about omics data analysis, we're usually diving into exciting territories like differential expression analysis, PCA, survival analysis, and much more! 🧬 `glystats` brings all these powerful analyses together under one unified, user-friendly interface. **🎯 Key Feature:** `glystats` offers two levels of interfaces to fit your workflow: - `gly_xxx()`: seamlessly works with [glyexp::experiment()], the beating heart of the `glycoverse` ecosystem πŸ’“ - `gly_xxx_()`: flexible enough to work with general inputs like matrices or data frames This vignette focuses on the `gly_xxx()` interface. New to [glyexp::experiment()]? No worries! Check out its [introduction](https://glycoverse.github.io/glyexp/articles/glyexp.html) first to get up to speed. ```{r setup} library(glystats) library(glyexp) library(glyread) library(glyclean) library(dplyr) ``` ## πŸ” Meet Your Data Let's start by exploring our demo dataset: ```{r} # Here we use `glyread` to read in the data and use `glyclean` to perform preprocessing exp <- read_pglyco3_pglycoquant("glycopeptides.list", sample_info = "sample_info.csv") |> auto_clean() exp ``` Look at that! πŸŽ‰ We've got a [glyexp::experiment()] packed with 12 samples and 263 glycoforms. That's plenty of data to work with! ```{r} get_var_info(exp) ``` The variable information tibble is like a detailed ID card for each glycoform πŸ†” - it contains everything you need to know: the protein, glycosylation site, and glycan structures. ```{r} get_sample_info(exp) ``` Our sample information tibble features a crucial "group" column 🏷️. Here's the key: "H" = healthy, "M" = hepatitis, "Y" = cirrhosis, and "C" = hepatocellular carcinoma. This gives us a perfect setup for comparative analysis! ## πŸš€ One Interface to Rule Them All Here's where `glystats` really shines! ✨ Every function follows a simple, intuitive naming pattern: `gly_` + analysis name. Think `gly_ttest()` for t-tests, `gly_pca()` for PCA, and so on. **Pro tip:** leverage RStudio's auto-completion to discover all available functions! πŸ’‘ Let's dive into action with an ANOVA analysis to identify differentially expressed glycoforms: ```{r} anova_res <- gly_anova(exp) ``` Boom! πŸ’₯ Analysis complete in just one line! `gly_anova()` intelligently follows the `glycoverse` naming conventions, automatically detecting the "group" column in your sample info and fitting an ANOVA model for each glycoform. ## πŸ“‹ Understanding Your Results All `glystats` functions return a consistent, well-structured list with two key components: - `tidy_result`: Clean, analysis-ready tibbles in tidy format πŸ“Š - `raw_result`: The original result objects from underlying statistical functions πŸ”§ For `gly_anova()`, the `tidy_result` contains two informative tibbles: `main_test` and `post_hoc_test`. You can use `get_tidy_result()` to get the tidy result tibble: ```{r} get_tidy_result(anova_res, "main_test") ``` Notice something cool? 😎 `gly_anova()` thoughtfully adds back all the descriptive columns from your variable tibble. Want to control this behavior? Just use the `add_info` parameter! The `raw_result` houses two lists of models - one for the main test, one for post hoc comparisons: ```{r} names(get_raw_result(anova_res)) ``` `get_tidy_result()` and `get_raw_result()` are useful to be used in pipes: ```{r} exp |> gly_anova() |> get_tidy_result("main_test") |> filter(p_adj < 0.05) ``` ## πŸ”§ Maximum Flexibility Mode Feeling constrained by [glyexp::experiment()]? Fear not! πŸ¦Έβ€β™€οΈ Every `gly_xxx()` function comes with a flexible `gly_xxx_()` sibling that works with standard R objects. For instance, `gly_anova_()` happily accepts plain matrices: ```{r} expr_mat <- get_expr_mat(exp) groups <- factor(get_sample_info(exp)$group) anova_res2 <- gly_anova_(expr_mat, groups) ``` This adaptability makes `glystats` a perfect team player 🀝 - it seamlessly integrates into existing analysis pipelines and workflows, no matter what data structures you're already using! ## πŸŽͺ The Complete Analytical Arsenal Ready to explore the full power of `glystats`? Here's your complete toolkit for glycomics and glycoproteomics data analysis: - **πŸ”¬ Differential Expression Analysis:** - `gly_ttest()`: Two-sample t-test - `gly_wilcox()`: Wilcoxon rank sum test - `gly_anova()`: One-way ANOVA - `gly_kruskal()`: Kruskal-Wallis rank sum test - `gly_limma()`: Linear models for microarray data (limma) - `gly_fold_change()`: Calculate fold change - **πŸ“ Dimensionality Reduction:** - `gly_pca()`: Principal component analysis - `gly_tsne()`: t-distributed stochastic neighbor embedding (t-SNE) - `gly_umap()`: Uniform manifold approximation and projection (UMAP) - `gly_oplsda()`: Orthogonal partial least squares discriminant analysis (OPLS-DA) - `gly_plsda()`: Partial least squares discriminant analysis (PLS-DA) - **🧩 Clustering:** - `gly_kmeans()`: K-means clustering - `gly_hclust()`: Hierarchical clustering - **⏱️ Survival Analysis:** - `gly_cox()`: Cox proportional hazards model - **🎯 Enrichment Analysis:** - `gly_enrich_go()`: Gene Ontology enrichment analysis - `gly_enrich_kegg()`: KEGG enrichment analysis - `gly_enrich_reactome()`: Reactome enrichment analysis - `gly_enrich_wikipathways()`: WikiPathways enrichment analysis - `gly_enrich_do()`: Disease Ontology enrichment analysis - `gly_enrich_ncg()`: Network of Cancer Genes enrichment analysis - `gly_enrich_dgn()`: DisGeNET enrichment analysis - **πŸ”§ Additional Tools:** - `gly_cor()`: Correlation analysis - `gly_roc()`: Receiver operating characteristic (ROC) analysis ## πŸš€ What's Next on Your Journey? Ready to dive deeper into the `glycoverse`? Here's your roadmap to success: 1. **πŸ“₯ Data Import:** Start with [glyread](https://glycoverse.github.io/glyread/articles/glyread.html) to seamlessly import your data into `glyexp::experiment()` objects 2. **🧹 Data Preprocessing:** Use [glyclean](https://glycoverse.github.io/glyclean/articles/glyclean.html) to polish and prepare your data for analysis 3. **πŸ“Š Statistical Analysis:** You're here! Use `glystats` to unlock powerful insights from your glycomics data 4. **🎨 Visualization:** Stay tuned! We're crafting an amazing `glyvis` package for stunning data visualizations Happy analyzing! πŸŽ‰βœ¨