---
title: "Getting Started with glystats"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with glystats}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

When we talk about omics data analysis,
we're usually diving into exciting territories like differential expression analysis,
PCA, survival analysis, and much more! 🧬
`glystats` brings all these powerful analyses together under one unified,
user-friendly interface.

**🎯 Key Feature:** `glystats` offers two levels of interfaces to fit your workflow:

- `gly_xxx()`: seamlessly works with [glyexp::experiment()],
  the beating heart of the `glycoverse` ecosystem 💓
- `gly_xxx_()`: flexible enough to work with general inputs like matrices or data frames

This vignette focuses on the `gly_xxx()` interface.
New to [glyexp::experiment()]?
No worries! Check out its [introduction](https://glycoverse.github.io/glyexp/articles/glyexp.html) first
to get up to speed.

```{r setup}
library(glystats)
library(glyexp)
library(glyread)
library(glyclean)
library(dplyr)
```

## 🔍 Meet Your Data

Let's start by exploring our demo dataset:

```{r}
# Here we use `glyread` to read in the data and use `glyclean` to perform preprocessing
exp <- read_pglyco3_pglycoquant("glycopeptides.list", sample_info = "sample_info.csv") |> auto_clean()
exp
```

Look at that! 🎉 We've got a [glyexp::experiment()] packed with 12 samples and 263 glycoforms.
That's plenty of data to work with!

```{r}
get_var_info(exp)
```

The variable information tibble is like a detailed ID card for each glycoform 🆔 -
it contains everything you need to know: the protein,
glycosylation site, and glycan structures.

```{r}
get_sample_info(exp)
```

Our sample information tibble features a crucial "group" column 🏷️.
Here's the key: "H" = healthy, "M" = hepatitis, "Y" = cirrhosis,
and "C" = hepatocellular carcinoma.
This gives us a perfect setup for comparative analysis!

## 🚀 One Interface to Rule Them All

Here's where `glystats` really shines! ✨
Every function follows a simple, intuitive naming pattern: `gly_` + analysis name.
Think `gly_ttest()` for t-tests, `gly_pca()` for PCA, and so on.

**Pro tip:** leverage RStudio's auto-completion to discover all available functions! 💡

Let's dive into action with an ANOVA analysis to identify differentially expressed glycoforms:

```{r}
anova_res <- gly_anova(exp)
```

Boom! 💥 Analysis complete in just one line!
`gly_anova()` intelligently follows the `glycoverse` naming conventions,
automatically detecting the "group" column in your sample info and fitting an ANOVA model
for each glycoform.

## 📋 Understanding Your Results

All `glystats` functions return a consistent, well-structured list with two key components:

- `tidy_result`: Clean, analysis-ready tibbles in tidy format 📊
- `raw_result`: The original result objects from underlying statistical functions 🔧

For `gly_anova()`, the `tidy_result` contains two informative tibbles:
`main_test` and `post_hoc_test`.

You can use `get_tidy_result()` to get the tidy result tibble:

```{r}
get_tidy_result(anova_res, "main_test")
```

Notice something cool? 😎 `gly_anova()` thoughtfully adds back all the descriptive columns
from your variable tibble.
Want to control this behavior? Just use the `add_info` parameter!

The `raw_result` houses two lists of models - one for the main test,
one for post hoc comparisons:

```{r}
names(get_raw_result(anova_res))
```

`get_tidy_result()` and `get_raw_result()` are useful to be used in pipes:

```{r}
exp |>
  gly_anova() |>
  get_tidy_result("main_test") |>
  filter(p_adj < 0.05)
```

## 🔧 Maximum Flexibility Mode

Feeling constrained by [glyexp::experiment()]? Fear not! 🦸‍♀️
Every `gly_xxx()` function comes with a flexible `gly_xxx_()` sibling that works with
standard R objects.
For instance, `gly_anova_()` happily accepts plain matrices:

```{r}
expr_mat <- get_expr_mat(exp)
groups <- factor(get_sample_info(exp)$group)
anova_res2 <- gly_anova_(expr_mat, groups)
```

This adaptability makes `glystats` a perfect team player 🤝 -
it seamlessly integrates into existing analysis pipelines and workflows,
no matter what data structures you're already using!

## 🎪 The Complete Analytical Arsenal

Ready to explore the full power of `glystats`?
Here's your complete toolkit for glycomics and glycoproteomics data analysis:

- **🔬 Differential Expression Analysis:**
  - `gly_ttest()`: Two-sample t-test
  - `gly_wilcox()`: Wilcoxon rank sum test
  - `gly_anova()`: One-way ANOVA
  - `gly_kruskal()`: Kruskal-Wallis rank sum test
  - `gly_limma()`: Linear models for microarray data (limma)
  - `gly_fold_change()`: Calculate fold change

- **📐 Dimensionality Reduction:**
  - `gly_pca()`: Principal component analysis
  - `gly_tsne()`: t-distributed stochastic neighbor embedding (t-SNE)
  - `gly_umap()`: Uniform manifold approximation and projection (UMAP)
  - `gly_oplsda()`: Orthogonal partial least squares discriminant analysis (OPLS-DA)
  - `gly_plsda()`: Partial least squares discriminant analysis (PLS-DA)

- **🧩 Clustering:**
  - `gly_kmeans()`: K-means clustering
  - `gly_hclust()`: Hierarchical clustering

- **⏱️ Survival Analysis:**
  - `gly_cox()`: Cox proportional hazards model

- **🎯 Enrichment Analysis:**
  - `gly_enrich_go()`: Gene Ontology enrichment analysis
  - `gly_enrich_kegg()`: KEGG enrichment analysis
  - `gly_enrich_reactome()`: Reactome enrichment analysis
  - `gly_enrich_wikipathways()`: WikiPathways enrichment analysis
  - `gly_enrich_do()`: Disease Ontology enrichment analysis
  - `gly_enrich_ncg()`: Network of Cancer Genes enrichment analysis
  - `gly_enrich_dgn()`: DisGeNET enrichment analysis

- **🔧 Additional Tools:**
  - `gly_cor()`: Correlation analysis
  - `gly_roc()`: Receiver operating characteristic (ROC) analysis

## 🚀 What's Next on Your Journey?

Ready to dive deeper into the `glycoverse`? Here's your roadmap to success:

1. **📥 Data Import:** Start with [glyread](https://glycoverse.github.io/glyread/articles/glyread.html)
   to seamlessly import your data into `glyexp::experiment()` objects

2. **🧹 Data Preprocessing:** Use [glyclean](https://glycoverse.github.io/glyclean/articles/glyclean.html)
   to polish and prepare your data for analysis

3. **📊 Statistical Analysis:** You're here! Use `glystats` to unlock powerful insights
   from your glycomics data

4. **🎨 Visualization:** Stay tuned! We're crafting an amazing `glyvis` package
   for stunning data visualizations

Happy analyzing! 🎉✨