---
title: "Quantifying Glycan Motifs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quantifying Glycan Motifs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we'll explore a special type of derived trait: motif quantification.
This powerful feature allows us to measure the abundance of biologically meaningful
glycan substructures across your samples.

```{r setup}
library(glydet)
library(glyexp)
library(glyclean)

exp <- auto_clean(real_experiment)
```

## What is motif quantification?

Glycan motifs are substructures with special biological or structural significance.
Think of them as functional building blocks that carry specific biological messages.
For example, the Lewis x antigen is a fucosylated carbohydrate epitope commonly found on
glycoproteins and glycolipids. It plays crucial roles in cell–cell recognition, adhesion, and immune responses.
Understanding the abundance of Lewis x antigens can provide valuable insights into immune system dynamics.

But here's the challenge: how do we quantify Lewis x antigens across samples?
Different glycans contain varying numbers of Lewis x motifs,
and these glycans themselves have different abundances across samples.
Our solution elegantly combines both pieces of information to estimate the true abundance of Lewis x antigens in your dataset.

## The `quantify_motifs()` function

Glydet provides the `quantify_motifs()` function to handle this complex task seamlessly.
This function takes a `glyexp::experiment()` object along with your motifs of interest
and returns a new experiment object enriched with motif quantifications.
Just like `derive_traits()`, `quantify_motifs()` works beautifully with both glycomics and glycoproteomics data.

Let's dive into a practical example:

```{r}
# Define our motifs of interest using IUPAC-condensed format
motifs <- c(
  Lxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-",  # Lewis x/a antigen
  SLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-"  # Sialyl Lewis x/a antigen
)

# Quantify the motifs in our dataset
motif_exp <- quantify_motifs(exp, motifs)
motif_exp
```

```{r}
get_var_info(motif_exp)
```

Notice how the variable information now includes a `motif` column instead of the usual `trait` column.
Don't worry—this is just a cosmetic difference.
Under the hood, the functionality works exactly the same way as `derive_traits()`.

## Absolute vs. relative motif quantification

Here's where things get a little bit complex: motif quantification can be performed in two distinct ways—absolute and relative.
By default, `quantify_motifs()` uses relative quantification,
but understanding both approaches is crucial for choosing the right method for your research.

**Important note:** Don't confuse this with the absolute and relative quantification of glycans or glycopeptides themselves—
that's a different concept entirely.

In traditional omics contexts, "absolute quantification" means determining actual concentrations (like mg/L) or counts (like mRNA copy numbers),
while "relative quantification" means comparing abundance between samples without meaningful absolute values.
Label-free quantification, for instance, is a relative method.

However, in motif quantification,
"absolute" and "relative" refer to whether we normalize motif abundances by the total abundance of
the glycome (or individual glycosite mini-glycomes).

Let's illustrate this with a concrete example.
Imagine a glycomics dataset with two samples, A and B,
each containing three glycans (G1, G2, G3) with these abundances:

- **Sample A:** G1 = 10, G2 = 20, G3 = 30
- **Sample B:** G1 = 20, G2 = 40, G3 = 60

For simplicity, let's assume our target motif appears exactly once in each glycan.

### Absolute motif quantification

We simply sum the motif abundances across all glycans:

- **Sample A:** 10 × 1 + 20 × 1 + 30 × 1 = 60
- **Sample B:** 20 × 1 + 40 × 1 + 60 × 1 = 120

The results clearly differ between samples.

### Relative motif quantification

We normalize by the total glycan abundance:

- **Sample A:** (10 × 1 + 20 × 1 + 30 × 1) ÷ (10 + 20 + 30) = 60 ÷ 60 = 1
- **Sample B:** (20 × 1 + 40 × 1 + 60 × 1) ÷ (20 + 40 + 60) = 120 ÷ 120 = 1

The results are identical in two samples!

This distinction matters because these methods answer different biological questions:

- **Absolute quantification** asks: "How many motifs are present in this sample?"
- **Relative quantification** asks: "If I randomly select one glycan molecule,
  how many motifs would I expect to find on it in average?"

### Choosing the right approach

So which method should you use? The answer depends on both your data type and research objectives.
Here are some practical guidelines:

**For glycomics data:** Use relative motif quantification in most cases,
since glycomics data is inherently compositional
(look up "compositional data analysis" if you're curious about the statistical theory behind this).

**For glycoproteomics data:**

- **Choose absolute quantification** when you want to compare observed motif abundance across samples.
  Keep in mind that many factors can cause differences between samples,
  including upregulation of enzymes responsible for the motif or changes in overall glycosylation site occupancy.
- **Choose relative quantification** when you want to understand underlying regulatory mechanisms
  by correcting for differences in site occupancy.
  This approach helps isolate the biological signal from technical variation.

## Connection to derived traits

You might wonder why `quantify_motifs()` lives in the `glydet` package rather than `glymotif`.
There's actually an interesting history here!
Before `glymotif` v0.9.0, there was indeed a `quantify_motifs()` function in `glymotif`.
However, we eventually realized that motif quantification is fundamentally a special case of derived traits,
so we reimplemented it in `glydet` with a more consistent and powerful interface.

To demonstrate this connection, let's manually perform motif quantification using `derive_traits()`.
This will help you understand that motif quantification is essentially just a specialized application of trait derivation:
`wsum()` traits for absolute quantification and `wmean()` traits for relative quantification.

```r
# First, add the meta-properties to the variable information
motifs <- c(
  nLxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-",  # Lewis x/a antigen
  nSLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-"  # Sialyl Lewis x/a antigen
)
exp_with_mps <- glymotif::add_motifs_int(exp, motifs)

# Define the traits using wsum() for absolute quantification
trait_fns <- list(Lxa = wsum(nLxa), SLxa = wsum(nSLxa))

# Calculate the traits
derive_traits(exp_with_mps, trait_fns = trait_fns)
```

This code snippet is functionally equivalent to:

```r
# The much simpler approach using quantify_motifs()
quantify_motifs(exp, motifs, method = "absolute")
```

Pretty neat, right?
The `quantify_motifs()` function is essentially doing all the heavy lifting shown in the manual approach,
but with additional robustness and user-friendly features.

For relative motif quantification,
you'd simply replace `wsum()` with `wmean()` in the manual approach,
while everything else stays the same.
This flexibility showcases the power of the underlying trait derivation system.