In this vignette, we’ll explore a special type of derived trait: motif quantification. This powerful feature allows us to measure the abundance of biologically meaningful glycan substructures across your samples.
library(glydet)
library(glyexp)
library(glyclean)
exp <- auto_clean(real_experiment)
#>
#> ── Normalizing data ──
#>
#> ℹ Normalization method: `normalize_median()`
#> ℹ Reason: default for "glycoproteomics".
#> ✔ Normalization completed.
#>
#> ── Removing variables with too many missing values ──
#>
#> ℹ Applying preset "discovery"...
#> ℹ Total removed: 24 (0.56%) variables.
#> ✔ Variable removal completed.
#>
#> ── Imputing missing values ──
#>
#> ℹ Imputation method: `impute_min_prob()`
#> ℹ Reason: default for "glycoproteomics" with n_samples < 30.
#> ✔ Imputation completed.
#>
#> ── Aggregating data ──
#>
#> ℹ Aggregating to "gfs" level
#> ✔ Aggregation completed.
#>
#> ── Normalizing data again ──
#>
#> ℹ Normalization method: `normalize_median()`
#> ℹ Reason: default for "glycoproteomics".
#> ✔ Normalization completed.
#>
#> ── Correcting batch effects ──
#>
#> ℹ Batch column batch not found in sample_info. Skipping batch correction.
#> ✔ Batch correction completed.Glycan motifs are substructures with special biological or structural significance. Think of them as functional building blocks that carry specific biological messages. For example, the Lewis x antigen is a fucosylated carbohydrate epitope commonly found on glycoproteins and glycolipids. It plays crucial roles in cell–cell recognition, adhesion, and immune responses. Understanding the abundance of Lewis x antigens can provide valuable insights into immune system dynamics.
But here’s the challenge: how do we quantify Lewis x antigens across samples? Different glycans contain varying numbers of Lewis x motifs, and these glycans themselves have different abundances across samples. Our solution elegantly combines both pieces of information to estimate the true abundance of Lewis x antigens in your dataset.
quantify_motifs() functionGlydet provides the quantify_motifs() function to handle
this complex task seamlessly. This function takes a
glyexp::experiment() object along with your motifs of
interest and returns a new experiment object enriched with motif
quantifications. Just like derive_traits(),
quantify_motifs() works beautifully with both glycomics and
glycoproteomics data.
Let’s dive into a practical example:
# Define our motifs of interest using IUPAC-condensed format
motifs <- c(
Lxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x/a antigen
SLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x/a antigen
)
# Quantify the motifs in our dataset
motif_exp <- quantify_motifs(exp, motifs)
#> Warning: `is_known_motif()` was deprecated in glymotif 0.16.0.
#> ℹ Please use `db_motif_info()` instead.
#> ℹ The deprecated feature was likely used in the glydet package.
#> Please report the issue to the authors.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
motif_exp
#>
#> ── Traitproteomics Experiment ──────────────────────────────────────────────────
#> ℹ Expression matrix: 12 samples, 552 variables
#> ℹ Sample information fields: group <fct>
#> ℹ Variable information fields: protein <chr>, protein_site <int>, motif <chr>, gene <chr>, motif_structure <struct>get_var_info(motif_exp)
#> # A tibble: 552 × 6
#> variable protein protein_site motif gene motif_structure
#> <glue> <chr> <int> <chr> <chr> <struct>
#> 1 A6NJW9-49-Lxa A6NJW9 49 Lxa CD8B2 Hex(??-?)[dHex(??-?)]HexNA…
#> 2 A6NJW9-49-SLxa A6NJW9 49 SLxa CD8B2 NeuAc(??-?)Hex(??-?)[dHex(…
#> 3 O14786-150-Lxa O14786 150 Lxa NRP1 Hex(??-?)[dHex(??-?)]HexNA…
#> 4 O14786-150-SLxa O14786 150 SLxa NRP1 NeuAc(??-?)Hex(??-?)[dHex(…
#> 5 O43866-226-Lxa O43866 226 Lxa CD5L Hex(??-?)[dHex(??-?)]HexNA…
#> 6 O43866-226-SLxa O43866 226 SLxa CD5L NeuAc(??-?)Hex(??-?)[dHex(…
#> 7 O75437-244-Lxa O75437 244 Lxa ZNF254 Hex(??-?)[dHex(??-?)]HexNA…
#> 8 O75437-244-SLxa O75437 244 SLxa ZNF254 NeuAc(??-?)Hex(??-?)[dHex(…
#> 9 O75581-281-Lxa O75581 281 Lxa LRP6 Hex(??-?)[dHex(??-?)]HexNA…
#> 10 O75581-281-SLxa O75581 281 SLxa LRP6 NeuAc(??-?)Hex(??-?)[dHex(…
#> # ℹ 542 more rowsNotice how the variable information now includes a motif
column instead of the usual trait column. Don’t worry—this
is just a cosmetic difference. Under the hood, the functionality works
exactly the same way as derive_traits().
Here’s where things get a little bit complex: motif quantification
can be performed in two distinct ways—absolute and relative. By default,
quantify_motifs() uses relative quantification, but
understanding both approaches is crucial for choosing the right method
for your research.
Important note: Don’t confuse this with the absolute and relative quantification of glycans or glycopeptides themselves— that’s a different concept entirely.
In traditional omics contexts, “absolute quantification” means determining actual concentrations (like mg/L) or counts (like mRNA copy numbers), while “relative quantification” means comparing abundance between samples without meaningful absolute values. Label-free quantification, for instance, is a relative method.
However, in motif quantification, “absolute” and “relative” refer to whether we normalize motif abundances by the total abundance of the glycome (or individual glycosite mini-glycomes).
Let’s illustrate this with a concrete example. Imagine a glycomics dataset with two samples, A and B, each containing three glycans (G1, G2, G3) with these abundances:
For simplicity, let’s assume our target motif appears exactly once in each glycan.
We simply sum the motif abundances across all glycans:
The results clearly differ between samples.
We normalize by the total glycan abundance:
The results are identical in two samples!
This distinction matters because these methods answer different biological questions:
So which method should you use? The answer depends on both your data type and research objectives. Here are some practical guidelines:
For glycomics data: Use relative motif quantification in most cases, since glycomics data is inherently compositional (look up “compositional data analysis” if you’re curious about the statistical theory behind this).
For glycoproteomics data:
You might wonder why quantify_motifs() lives in the
glydet package rather than glymotif. There’s
actually an interesting history here! Before glymotif
v0.9.0, there was indeed a quantify_motifs() function in
glymotif. However, we eventually realized that motif
quantification is fundamentally a special case of derived traits, so we
reimplemented it in glydet with a more consistent and
powerful interface.
To demonstrate this connection, let’s manually perform motif
quantification using derive_traits(). This will help you
understand that motif quantification is essentially just a specialized
application of trait derivation: wsum() traits for absolute
quantification and wmean() traits for relative
quantification.
# First, add the meta-properties to the variable information
motifs <- c(
nLxa = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x/a antigen
nSLxa = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x/a antigen
)
exp_with_mps <- glymotif::add_motifs_int(exp, motifs)
# Define the traits using wsum() for absolute quantification
trait_fns <- list(Lxa = wsum(nLxa), SLxa = wsum(nSLxa))
# Calculate the traits
derive_traits(exp_with_mps, trait_fns = trait_fns)This code snippet is functionally equivalent to:
# The much simpler approach using quantify_motifs()
quantify_motifs(exp, motifs, method = "absolute")Pretty neat, right? The quantify_motifs() function is
essentially doing all the heavy lifting shown in the manual approach,
but with additional robustness and user-friendly features.
For relative motif quantification, you’d simply replace
wsum() with wmean() in the manual approach,
while everything else stays the same. This flexibility showcases the
power of the underlying trait derivation system.