---
title: "Get Started with glydet"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get Started with glydet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Derived traits are fabricated features that capture specific structural characteristics of a glycome.
For example, the proportion of core fucosylated glycans within all glycans is a derived trait
that reflects the overall level of core fucosylation.
The glycomics community has long recognized the value of derived traits for interpreting complex glycomics data,
but calculating them has often been a manual, error-prone process.

`glydet` makes this workflow more direct.
The package provides tools for automatically calculating a wide range of derived traits.
It also provides a flexible interface for defining custom traits.
For glycoproteomics data, `glydet` can calculate derived traits in a site-specific manner.

## Important Notes Before You Start

### Prerequisites

This package is built on the [glyrepr](https://github.com/glycoverse/glyrepr) package,
and heavily relies on the [glyexp](https://github.com/glycoverse/glyexp) package.
If you are not familiar with these two packages,
we highly recommend checking out their introductions first.

### Data Types

`glydet` is designed to work with untargeted glycomics and glycoproteomics data.
Label-free quantification data is readily supported by `glyread` and `glyclean`.

For labeling quantification like TMT, ratios between the target and reference channels (TMT ratios)
must be converted to abundance matrix following this procedure:

1. Calculate the median of all reference channel MS2 summed intensities from the unnormalized intensity matrix for each glycopeptide
2. Multiply the TMT ratios by the median MS2 summed intensities for each glycopeptide

This enables the quantification of different glycopeptides in the same sample to be comparable.

## Your First Analysis

```{r setup}
library(glydet)
library(glyrepr)
library(glyexp)
library(glyclean)
library(glystats)
library(dplyr)
```

We'll work with `glyexp::real_experiment2`, a real glycomics dataset containing 144 samples and 67 glycans.

Before calculating derived traits, preprocess your data with `glyclean`.
This makes the trait analysis use the cleaned abundance data.

```{r}
exp <- auto_clean(real_experiment2)  # Preprocess the data
exp
```

Let's inspect the dataset before calculating traits:

```{r}
get_var_info(exp)
```

```{r}
get_sample_info(exp)
```

```{r}
get_expr_mat(exp)[1:5, 1:5]
```

Now let's calculate derived traits:

```{r}
trait_exp <- derive_traits(exp)
trait_exp
```

The result is a new `experiment()` object with "traitomics" type.
Instead of storing the abundance of each glycan in each sample,
it stores the value of each derived trait in each sample.

```{r}
get_var_info(trait_exp)
```

```{r}
# These are the trait values
get_expr_mat(trait_exp)[1:5, 1:5]
```

`derive_traits()` calculated the following built-in derived traits:

- **`TM`**: Proportion of high-mannose glycans
- **`TH`**: Proportion of hybrid glycans  
- **`TC`**: Proportion of complex glycans
- **`MM`**: Average number of mannoses within high-mannose glycans
- **`CA2`**: Proportion of bi-antennary glycans within complex glycans
- **`CA3`**: Proportion of tri-antennary glycans within complex glycans
- **`CA4`**: Proportion of tetra-antennary glycans within complex glycans
- **`TF`**: Proportion of fucosylated glycans
- **`TFc`**: Proportion of core-fucosylated glycans
- **`TFa`**: Proportion of arm-fucosylated glycans
- **`TB`**: Proportion of glycans with bisecting GlcNAc
- **`GS`**: Average degree of sialylation per galactose
- **`AG`**: Average degree of galactosylation per antenna
- **`TS`**: Proportion of sialylated glycans

**Important note:** All built-in derived traits only work for N-glycans.
For other types of glycans, you need to define your own derived traits.
Check out the [Defining Custom Traits](https://glycoverse.github.io/glydet/articles/custom-traits.html) vignette
to learn how to define your own derived traits.
For glycan classes with simpler structural patterns than N-glycans, such as O-glycans,
we recommend using [motif quantification](https://glycoverse.github.io/glydet/articles/quantify-motifs.html) instead.

Because `derive_traits()` returns a `glyexp::experiment()` object,
functions in `glystats` can be applied to the derived traits to perform statistical analyses.

```{r}
anova_res <- gly_anova(trait_exp)
anova_res |>
  get_tidy_result("main_test") |>
  filter(p_adj < 0.05)
```

## Site-Specific Derived Traits in Glycoproteomics

Calculating derived traits for glycoproteomics data requires grouping by glycosite.

Let's first review the structure of glycoproteomics data.
Usually, in glycoproteomics, we analyze the quantification of glycoforms in samples.
A glycoform is a unique combination of a glycosite and a glycan structure.
Note that one glycosite can have many different glycan structures,
resulting in multiple glycoforms for the same site.

To calculate derived traits for glycoproteomics data,
we calculate derived traits separately within each glycosite.
This captures site-specific glycosylation patterns and their variations across samples.

Operationally, there is no difference between calculating derived traits for glycomics and glycoproteomics data.

```{r}
# A glycoproteomics dataset
gp_exp <- auto_clean(glyexp::real_experiment)
gp_exp
```

```{r}
gp_trait_exp <- derive_traits(gp_exp)
gp_trait_exp
```

The result is also a `glyexp::experiment()` object but with "traitproteomics" type.
The only difference is that the variable information now contains a `glycosite` column,
which indicates the glycosite for each trait variable.
The experiment object therefore contains the value of each derived trait for each glycosite in each sample.

You can again apply `glystats` functions to perform statistical analyses.
For example, we can identify glycosites with dysregulated core fucosylation:

```{r}
gly_anova(gp_trait_exp) |>
  get_tidy_result("main_test") |>
  filter(trait == "TFc", p_adj < 0.05) |>
  select(protein, protein_site)
```

## Understanding Meta-Properties

To understand how `glydet` calculates derived traits,
it helps to look at the meta-properties used by the package.

The key concept is **"meta-properties"**:
properties that describe individual glycans.

**What's the difference?**

- **Derived traits** describe entire glycomes, or all glycans on a glycosite, and their values vary across samples
- **Meta-properties** describe individual glycans regardless of their abundance, such as the number of antennae, core fucoses, or sialic acids on a single structure

**The connection:** Derived traits are calculated from meta-properties and abundance values.
When you call `derive_traits()`,
`glydet` automatically calculates meta-properties for all glycans first,
then uses this information to compute the derived traits you see.

You can also work with meta-properties directly through two functions:

- **`get_meta_properties()`**: Calculate meta-properties for any set of glycans
- **`add_meta_properties()`**: Add meta-properties to the variable information of an `experiment()` object

### get_meta_properties()

Let's try `get_meta_properties()` with a few glycan structures from the dataset:

```{r}
glycans <- unique(get_var_info(exp)$glycan_structure)[1:5]
glycans
```

**Note:** `glycans` is a `glyrepr::glycan_structure()` vector.
These are standardized representations of glycan structures.

Now calculate their meta-properties:

```{r}
get_meta_properties(glycans)
```

### add_meta_properties()

When working with `glyexp::experiment()` objects,
you can add meta-properties directly to the variable information:

```{r}
exp_with_mp <- add_meta_properties(exp)
get_var_info(exp_with_mp)
```

The variable information now contains multiple meta-property columns.
This supports filtering based on structural features.

For instance, filter for all glycoforms containing high-mannose glycans:

```{r}
exp_with_mp |>
  filter_var(Tp == "highmannose")
```

### Meta-Property Functions

Meta-properties are implemented as functions that take `glyrepr::glycan_structure()` vectors
and return corresponding property values.
`glydet` includes a set of built-in meta-property functions:

```{r}
names(all_mp_fns())
```

Here is the full list of built-in meta-property functions:

| Name | Function | Description |
|------|----------|-------------|
| `Tp` | `n_glycan_type()` | Type of the glycan, either "complex", "hybrid", "highmannose", or "pausimannose" |
| `B` | `has_bisecting()` | Whether the glycan has a bisecting GlcNAc |
| `nA` | `n_antennae()` | Number of antennae |
| `nF` | `n_fuc()` | Number of fucoses |
| `nFc` | `n_core_fuc()` | Number of core fucoses |
| `nFa` | `n_arm_fuc()` | Number of arm fucoses |
| `nG` | `n_gal()` | Number of galactoses |
| `nGt` | `n_terminal_gal()` | Number of terminal galactoses |
| `nS` | `n_sia()` | Number of sialic acids |
| `nM` | `n_man()` | Number of mannoses |

Each function can be called directly:

```{r}
n_glycan_type(glycans)
```

## Working with Structural Ambiguity

An important design principle of `glydet` is its ability to handle glycan structures with varying levels of detail.
All built-in meta-properties and derived traits are designed to work with the **minimum information typically available**
for N-glycans in most experimental scenarios.

### Generic vs. Specific Monosaccharides

`glydet` works with generic monosaccharide names, such as "Hex", "HexNAc", and "dHex", as well as structures lacking linkage information.
This level of structural resolution reflects what is commonly achievable in glycoproteomics workflows,
where complete structural determination is often challenging.

For example, this ambiguous structure is supported:

```{r}
# Generic monosaccharides with unknown linkages
ambiguous_glycan <- "HexNAc(??-?)Hex(??-?)[Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-"
```

### Handling Detailed Structures 

This design philosophy doesn't limit `glydet`'s applicability to well-characterized structures.
The package also handles glycans with complete structural information:

```{r}
# Fully specified structure with specific monosaccharides and linkages
detailed_glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-3)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc(b1-"
```

### Extending Functionality

When working with highly detailed structural information,
you may want to create specialized meta-property functions that leverage specific monosaccharide identities or linkage patterns.
This allows you to define custom derived traits that use structural features beyond those covered by the built-in functions.

## What's Next?

Now you have the main concepts needed to use `glydet`.
You can try `traits_detailed()` to calculate more detailed derived traits.
You can also start to define your own meta-property functions and derived traits.
Check out the [Custom Traits](https://glycoverse.github.io/glydet/articles/custom-traits.html) vignette
to learn how to define your own derived traits.
Or you can check out a special type of derived traits:
[motif quantification](https://glycoverse.github.io/glydet/articles/quantify-motifs.html).