Quantifying Glycan Motifs

In this vignette, we’ll explore a special type of derived trait: motif quantification. This powerful feature allows us to measure the abundance of biologically meaningful glycan substructures across your samples.

library(glydet)
library(glyexp)
library(glyclean)

exp <- auto_clean(real_experiment)
#> 
#> ── Normalizing data ──
#> 
#> ℹ Normalization method: `normalize_median()`
#> ℹ Reason: default for "glycoproteomics".
#> ✔ Normalization completed.
#> 
#> ── Removing variables with too many missing values ──
#> 
#> ℹ Applying preset "discovery"...
#> ℹ Total removed: 24 (0.56%) variables.
#> ✔ Variable removal completed.
#> 
#> ── Imputing missing values ──
#> 
#> ℹ Imputation method: `impute_min_prob()`
#> ℹ Reason: default for "glycoproteomics" with n_samples < 30.
#> ✔ Imputation completed.
#> 
#> ── Aggregating data ──
#> 
#> ℹ Aggregating to "gfs" level
#> ✔ Aggregation completed.
#> 
#> ── Normalizing data again ──
#> 
#> ℹ Normalization method: `normalize_median()`
#> ℹ Reason: default for "glycoproteomics".
#> ✔ Normalization completed.
#> 
#> ── Correcting batch effects ──
#> 
#> ℹ Batch column batch not found in sample_info. Skipping batch correction.
#> ✔ Batch correction completed.

What is motif quantification?

Glycan motifs are substructures with special biological or structural significance. Think of them as functional building blocks that carry specific biological messages. For example, the Lewis x antigen is a fucosylated carbohydrate epitope commonly found on glycoproteins and glycolipids. It plays crucial roles in cell–cell recognition, adhesion, and immune responses. Understanding the abundance of Lewis x antigens can provide valuable insights into immune system dynamics.

But here’s the challenge: how do we quantify Lewis x antigens across samples? Different glycans contain varying numbers of Lewis x motifs, and these glycans themselves have different abundances across samples. Our solution elegantly combines both pieces of information to estimate the true abundance of Lewis x antigens in your dataset.

The `quantify_motifs()` function

Glydet provides the quantify_motifs() function to handle this complex task seamlessly. This function takes a glyexp::experiment() object along with your motifs of interest and returns a new experiment object enriched with motif quantifications. Just like derive_traits(), quantify_motifs() works beautifully with both glycomics and glycoproteomics data.

Let’s dive into a practical example:

# Define our motifs of interest using IUPAC-condensed format
motifs <- c(
  Lxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-",  # Lewis x/a antigen
  SLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-"  # Sialyl Lewis x/a antigen
)

# Quantify the motifs in our dataset
motif_exp <- quantify_motifs(exp, motifs)
#> Warning: `is_known_motif()` was deprecated in glymotif 0.16.0.
#> ℹ Please use `db_motif_info()` instead.
#> ℹ The deprecated feature was likely used in the glydet package.
#>   Please report the issue to the authors.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
motif_exp
#> 
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 552 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, motif <chr>, gene <chr>, motif_structure <struct>

get_var_info(motif_exp)
#> # A tibble: 552 × 6
#>    variable        protein protein_site motif gene   motif_structure            
#>    <glue>          <chr>          <int> <chr> <chr>  <struct>                   
#>  1 A6NJW9-49-Lxa   A6NJW9            49 Lxa   CD8B2  Hex(??-?)[dHex(??-?)]HexNA…
#>  2 A6NJW9-49-SLxa  A6NJW9            49 SLxa  CD8B2  NeuAc(??-?)Hex(??-?)[dHex(…
#>  3 O14786-150-Lxa  O14786           150 Lxa   NRP1   Hex(??-?)[dHex(??-?)]HexNA…
#>  4 O14786-150-SLxa O14786           150 SLxa  NRP1   NeuAc(??-?)Hex(??-?)[dHex(…
#>  5 O43866-226-Lxa  O43866           226 Lxa   CD5L   Hex(??-?)[dHex(??-?)]HexNA…
#>  6 O43866-226-SLxa O43866           226 SLxa  CD5L   NeuAc(??-?)Hex(??-?)[dHex(…
#>  7 O75437-244-Lxa  O75437           244 Lxa   ZNF254 Hex(??-?)[dHex(??-?)]HexNA…
#>  8 O75437-244-SLxa O75437           244 SLxa  ZNF254 NeuAc(??-?)Hex(??-?)[dHex(…
#>  9 O75581-281-Lxa  O75581           281 Lxa   LRP6   Hex(??-?)[dHex(??-?)]HexNA…
#> 10 O75581-281-SLxa O75581           281 SLxa  LRP6   NeuAc(??-?)Hex(??-?)[dHex(…
#> # ℹ 542 more rows

Notice how the variable information now includes a motif column instead of the usual trait column. Don’t worry—this is just a cosmetic difference. Under the hood, the functionality works exactly the same way as derive_traits().

Absolute vs. relative motif quantification

Here’s where things get a little bit complex: motif quantification can be performed in two distinct ways—absolute and relative. By default, quantify_motifs() uses relative quantification, but understanding both approaches is crucial for choosing the right method for your research.

Important note: Don’t confuse this with the absolute and relative quantification of glycans or glycopeptides themselves— that’s a different concept entirely.

In traditional omics contexts, “absolute quantification” means determining actual concentrations (like mg/L) or counts (like mRNA copy numbers), while “relative quantification” means comparing abundance between samples without meaningful absolute values. Label-free quantification, for instance, is a relative method.

However, in motif quantification, “absolute” and “relative” refer to whether we normalize motif abundances by the total abundance of the glycome (or individual glycosite mini-glycomes).

Let’s illustrate this with a concrete example. Imagine a glycomics dataset with two samples, A and B, each containing three glycans (G1, G2, G3) with these abundances:

Sample A: G1 = 10, G2 = 20, G3 = 30
Sample B: G1 = 20, G2 = 40, G3 = 60

For simplicity, let’s assume our target motif appears exactly once in each glycan.

Absolute motif quantification

We simply sum the motif abundances across all glycans:

Sample A: 10 × 1 + 20 × 1 + 30 × 1 = 60
Sample B: 20 × 1 + 40 × 1 + 60 × 1 = 120

The results clearly differ between samples.

Relative motif quantification

We normalize by the total glycan abundance:

Sample A: (10 × 1 + 20 × 1 + 30 × 1) ÷ (10 + 20 + 30) = 60 ÷ 60 = 1
Sample B: (20 × 1 + 40 × 1 + 60 × 1) ÷ (20 + 40 + 60) = 120 ÷ 120 = 1

The results are identical in two samples!

This distinction matters because these methods answer different biological questions:

Absolute quantification asks: “How many motifs are present in this sample?”
Relative quantification asks: “If I randomly select one glycan molecule, how many motifs would I expect to find on it in average?”

Choosing the right approach

So which method should you use? The answer depends on both your data type and research objectives. Here are some practical guidelines:

For glycomics data: Use relative motif quantification in most cases, since glycomics data is inherently compositional (look up “compositional data analysis” if you’re curious about the statistical theory behind this).

For glycoproteomics data:

Choose absolute quantification when you want to compare observed motif abundance across samples. Keep in mind that many factors can cause differences between samples, including upregulation of enzymes responsible for the motif or changes in overall glycosylation site occupancy.
Choose relative quantification when you want to understand underlying regulatory mechanisms by correcting for differences in site occupancy. This approach helps isolate the biological signal from technical variation.

Connection to derived traits

You might wonder why quantify_motifs() lives in the glydet package rather than glymotif. There’s actually an interesting history here! Before glymotif v0.9.0, there was indeed a quantify_motifs() function in glymotif. However, we eventually realized that motif quantification is fundamentally a special case of derived traits, so we reimplemented it in glydet with a more consistent and powerful interface.

To demonstrate this connection, let’s manually perform motif quantification using derive_traits(). This will help you understand that motif quantification is essentially just a specialized application of trait derivation: wsum() traits for absolute quantification and wmean() traits for relative quantification.

# First, add the meta-properties to the variable information
motifs <- c(
  nLxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-",  # Lewis x/a antigen
  nSLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-"  # Sialyl Lewis x/a antigen
)
exp_with_mps <- glymotif::add_motifs_int(exp, motifs)

# Define the traits using wsum() for absolute quantification
trait_fns <- list(Lxa = wsum(nLxa), SLxa = wsum(nSLxa))

# Calculate the traits
derive_traits(exp_with_mps, trait_fns = trait_fns)

This code snippet is functionally equivalent to:

# The much simpler approach using quantify_motifs()
quantify_motifs(exp, motifs, method = "absolute")

Pretty neat, right? The quantify_motifs() function is essentially doing all the heavy lifting shown in the manual approach, but with additional robustness and user-friendly features.

For relative motif quantification, you’d simply replace wsum() with wmean() in the manual approach, while everything else stays the same. This flexibility showcases the power of the underlying trait derivation system.