| Title: | Describe Glycosylation Structural Properties in a Site Specific Manner |
|---|---|
| Description: | Calculate and analyze glycosylation derived traits for site-specific glycoproteomics data. Derived traits are summary features that aggregate individual glycan abundances into biologically meaningful groups (e.g., galactosylation, sialylation, fucosylation, branching). This package provides functions to compute well-established derived traits from the literature, and features a domain-specific language for defining custom derived traits tailored to specific research needs. |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.11.0 |
| Built: | 2026-05-17 02:52:57 UTC |
| Source: | https://github.com/glycoverse/glydet |
This function adds meta-properties to the variable information of a glyexp::experiment().
Under the hood, it uses get_meta_properties() to calculate the meta-properties
on the "glycan_structure" column (or column specified by struc_col) of the variable information tibble,
and then adds the result back as new columns.
add_meta_properties( exp, mp_fns = NULL, struc_col = "glycan_structure", overwrite = FALSE )add_meta_properties( exp, mp_fns = NULL, struc_col = "glycan_structure", overwrite = FALSE )
exp |
An |
mp_fns |
A named list of meta-property functions.
Names of the list are the names of the meta-properties. Default is |
struc_col |
The column name of the glycan structures in the variable information tibble. Default is "glycan_structure". |
overwrite |
Whether to overwrite the existing meta-property columns. Default is FALSE, raising an error if the existing columns are found. |
An glyexp::experiment() object with meta-properties added to the variable information.
get_meta_properties(), glyexp::experiment()
library(glyexp) # Compare the columns in the variable information before and after adding meta-properties exp <- real_experiment |> slice_sample_var(n = 10) colnames(get_var_info(exp)) exp2 <- add_meta_properties(exp) colnames(get_var_info(exp2))library(glyexp) # Compare the columns in the variable information before and after adding meta-properties exp <- real_experiment |> slice_sample_var(n = 10) colnames(get_var_info(exp)) exp2 <- add_meta_properties(exp) colnames(get_var_info(exp2))
This function returns a named list of all meta-property functions:
"Tp": n_glycan_type(): type of the glycan
"B": has_bisecting(): whether the glycan has a bisecting GlcNAc
"nA": n_antennae(): number of antennae
"nF": n_fuc(): number of fucoses
"nFc": n_core_fuc(): number of core fucoses
"nFa": n_arm_fuc(): number of arm fucoses
"nG": n_gal(): number of galactoses
"nGt": n_terminal_gal(): number of terminal galactoses
"nS": n_sia(): number of sialic acids
"nM": n_man(): number of mannoses
all_mp_fns()all_mp_fns()
A named list of meta-property functions.
This function calculates derived traits from a glyexp::experiment() object.
For glycomics data, it calculates the derived traits directly.
For glycoproteomics data, each glycosite is treated as a separate glycome,
and derived traits are calculated in a site-specific manner.
derive_traits(exp, trait_fns = NULL, mp_fns = NULL, mp_cols = NULL)derive_traits(exp, trait_fns = NULL, mp_fns = NULL, mp_cols = NULL)
exp |
A |
trait_fns |
A named list of derived trait functions created by trait factories.
Names of the list are the names of the derived traits.
Default is |
mp_fns |
A named list of meta-property functions.
This parameter is useful if your trait functions use custom meta-properties
other than those in |
mp_cols |
A character vector of column names in the |
A new glyexp::experiment() object for derived traits.
Instead of "quantification of each glycan on each glycosite in each sample",
the new experiment() contains "the value of each derived trait on each glycosite in each sample",
with the following columns in the var_info table:
variable: variable ID
trait: derived trait name
explanation: a concise English explanation of the trait
For glycoproteomics data, with additional columns:
protein: protein ID
protein_site: the glycosite position on the protein
Other columns in the var_info table (e.g. gene) are retained if they have "many-to-one"
relationship with glycosites (unique combinations of protein, protein_site).
That is, each glycosite cannot have multiple values for these columns.
gene is a common example, as a glycosite can only be associate with one gene.
Descriptions about glycans are not such a column, as a glycosite can have multiple glycans,
thus having multiple descriptions.
Columns not having this relationship with glycosites will be dropped.
Don't worry if you cannot understand this logic,
as long as you know that this function will try its best to preserve useful information.
sample_info and meta_data are not modified,
except that the exp_type field of meta_data is set to "traitomics" for glycomics data,
and "traitproteomics" for glycoproteomics data.
traits_basic(), traits_detailed()
library(glyexp) library(glyclean) exp <- real_experiment |> auto_clean() |> slice_sample_var(n = 100) trait_exp <- derive_traits(exp) trait_exp # By default, only basic traits are calculated names(traits_basic()) # You can calculate detailed traits in `traits_detailed()` more_trait_exp <- derive_traits(exp, trait_fns = traits_detailed()) more_trait_explibrary(glyexp) library(glyclean) exp <- real_experiment |> auto_clean() |> slice_sample_var(n = 100) trait_exp <- derive_traits(exp) trait_exp # By default, only basic traits are calculated names(traits_basic()) # You can calculate detailed traits in `traits_detailed()` more_trait_exp <- derive_traits(exp, trait_fns = traits_detailed()) more_trait_exp
This function calculates derived traits from a tibble in tidy format.
Use this function if you are not using the glyexp package.
For glycomics data, it calculates the derived traits directly.
For glycoproteomics data, each glycosite is treated as a separate glycome,
and derived traits are calculated in a site-specific manner.
derive_traits_(tbl, data_type, trait_fns = NULL, mp_fns = NULL)derive_traits_(tbl, data_type, trait_fns = NULL, mp_fns = NULL)
tbl |
A tibble in tidy format, with the following columns:
For glycoproteomics data, additional columns are needed:
Other columns are ignored. Please make sure that the data has been properly preprocessed, including normalization, missing value handling, etc. Specifically, for glycoproteomics data, please make sure that the data has been aggregated to the "glycoforms with structures" level. That is the quantification of each glycan structure on each glycosite in each sample. |
data_type |
Either "glycomics" or "glycoproteomics". |
trait_fns |
A named list of derived trait functions created by trait factories.
Names of the list are the names of the derived traits.
Default is |
mp_fns |
A named list of meta-property functions.
This parameter is useful if your trait functions use custom meta-properties
other than those in |
A tidy tibble containing the following columns:
sample: sample ID
trait: derived trait name
value: the value of the derived trait
explanation: a concise English explanation of the trait
For glycoproteomics data, with additional columns:
protein: protein ID
protein_site: the glycosite position on the protein
Other columns in the original tibble are not included.
derive_traits(), traits_basic(), traits_detailed()
# Create example tidy data library(dplyr) library(glyexp) library(tibble) tidy_data <- as_tibble(real_experiment2) # Calculate traits traits <- derive_traits_(tidy_data, data_type = "glycomics") traits# Create example tidy data library(dplyr) library(glyexp) library(tibble) tidy_data <- as_tibble(real_experiment2) # Calculate traits traits <- derive_traits_(tidy_data, data_type = "glycomics") traits
This function provides a human-readable English explanation of what a derived trait represents.
It works with trait functions created by the trait factories prop(), ratio(), wmean(),
total(), and wsum(), and works best with traits defined by built-in meta-properties.
explain_trait( trait_fn, use_ai = FALSE, custom_mp = NULL, provider = getOption("glydet.ai_provider", "deepseek"), model = getOption("glydet.ai_model", NULL), api_key = getOption("glydet.ai_api_key", NULL), base_url = getOption("glydet.ai_base_url", NULL) )explain_trait( trait_fn, use_ai = FALSE, custom_mp = NULL, provider = getOption("glydet.ai_provider", "deepseek"), model = getOption("glydet.ai_model", NULL), api_key = getOption("glydet.ai_api_key", NULL), base_url = getOption("glydet.ai_base_url", NULL) )
A character string containing a concise English explanation of the trait.
# Explain built-in traits explain_trait(traits_basic()$TM) explain_trait(traits_basic()$GS) # Explain custom traits explain_trait(prop(nFc > 0)) explain_trait(prop(nFc > 0, within = (Tp == "complex"))) explain_trait(ratio(Tp == "complex", Tp == "hybrid")) explain_trait(wmean(nA, within = (Tp == "complex"))) explain_trait(wmean(nS / nG, within = nA == 4 & nFc > 0)) # Explain total and wsum traits explain_trait(total(Tp == "complex")) explain_trait(wsum(nS)) explain_trait(wsum(nS, within = (Tp == "complex")))# Explain built-in traits explain_trait(traits_basic()$TM) explain_trait(traits_basic()$GS) # Explain custom traits explain_trait(prop(nFc > 0)) explain_trait(prop(nFc > 0, within = (Tp == "complex"))) explain_trait(ratio(Tp == "complex", Tp == "hybrid")) explain_trait(wmean(nA, within = (Tp == "complex"))) explain_trait(wmean(nS / nG, within = nA == 4 & nFc > 0)) # Explain total and wsum traits explain_trait(total(Tp == "complex")) explain_trait(wsum(nS)) explain_trait(wsum(nS, within = (Tp == "complex")))
This function calculates the meta-properties of the given glycans. Meta-properties are properties describing certain structural characteristics of glycans. For example, the number of antennae, the number of core fucoses, etc.
get_meta_properties(glycans, mp_fns = NULL)get_meta_properties(glycans, mp_fns = NULL)
glycans |
A |
mp_fns |
A named list of meta-property functions.
Names of the list are the names of the meta-properties. Default is |
A tibble with the meta-properties. Column names are the names of the meta-properties.
glycans <- c( "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "Fuc(a1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(?1-", "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(?1-" ) # Use default meta-property functions get_meta_properties(glycans) # Use custom meta-property functions fns <- list( nN = ~ glyrepr::count_mono(.x, "HexNAc"), # purrr-style lambda function nH = ~ glyrepr::count_mono(.x, "Hex") ) get_meta_properties(glycans, fns)glycans <- c( "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "Fuc(a1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(?1-", "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(?1-" ) # Use default meta-property functions get_meta_properties(glycans) # Use custom meta-property functions fns <- list( nN = ~ glyrepr::count_mono(.x, "HexNAc"), # purrr-style lambda function nH = ~ glyrepr::count_mono(.x, "Hex") ) get_meta_properties(glycans, fns)
This function allows you to create a derived trait function using natural language.
Note that LLMs can be unreliable, so the result should be verified manually.
If the description is not clear, an error will be raised.
Try to read the descriptions of built-in traits to get ideas.
Currently, only
prop(), ratio(), and wmean() are supported.
To use this feature, you need to install the ellmer package.
DeepSeek is used by default for backward compatibility. Other ellmer
providers can be selected with provider, model, and provider-specific
API key configuration.
make_trait( description, custom_mp = NULL, max_retries = 2, verbose = FALSE, provider = getOption("glydet.ai_provider", "deepseek"), model = getOption("glydet.ai_model", NULL), api_key = getOption("glydet.ai_api_key", NULL), base_url = getOption("glydet.ai_base_url", NULL) )make_trait( description, custom_mp = NULL, max_retries = 2, verbose = FALSE, provider = getOption("glydet.ai_provider", "deepseek"), model = getOption("glydet.ai_model", NULL), api_key = getOption("glydet.ai_api_key", NULL), base_url = getOption("glydet.ai_base_url", NULL) )
description |
A description of the trait in natural language. |
custom_mp |
A named character vector of custom meta-properties.
The names are the meta-property names, and the values are in the format
"(type) description". For example:
|
max_retries |
Maximum number of reflection retries when the AI-generated formula's explanation doesn't match the original description. Default is 2. |
verbose |
Whether to print verbose output. Default is FALSE. This is useful for inspecting how LLMs generate trait functions. |
provider |
AI provider passed to |
model |
Model to use. Defaults to |
api_key |
API key for the selected provider. If |
base_url |
Optional base URL for custom or OpenAI-compatible endpoints.
Defaults to |
A derived trait function.
# Sys.setenv(DEEPSEEK_API_KEY = "your_api_key") # my_traits <- list( # nS = make_trait("the average number of sialic acids"), # nG = make_trait("the average number of galactoses") # ) # The trait function can then be used in `derive_traits()`: # derive_traits(exp, trait_fns = my_traits)# Sys.setenv(DEEPSEEK_API_KEY = "your_api_key") # my_traits <- list( # nS = make_trait("the average number of sialic acids"), # nG = make_trait("the average number of galactoses") # ) # The trait function can then be used in `derive_traits()`: # derive_traits(exp, trait_fns = my_traits)
These functions check key properties of an N-glycan:
n_glycan_type(): Determine the N-glycan type.
has_bisecting(): Check if the glycan has a bisecting GlcNAc.
n_antennae(): Count the number of antennae.
n_fuc(): Count the number of fucoses.
n_core_fuc(): Count the number of core fucoses.
n_arm_fuc(): Count the number of arm fucoses.
n_gal(): Count the number of galactoses.
n_terminal_gal(): Count the number of terminal galactoses.
n_sia(): Count the number of sialic acids.
n_man(): Count the number of mannoses.
All functions assume the glycans are N-glycans without validation, thus may return meaningless values for non-N-glycans. Therefore, please make sure to pass in N-glycans only.
All functions put minimum requirement on the glycans, i.e. they work with glycans with generic monosaccharides (e.g. "Hex", "HexNAc") and no linkage information. This type of structures are common in glycoproteomics and glycomics studies.
n_glycan_type(glycans) has_bisecting(glycans) n_antennae(glycans) n_fuc(glycans) n_core_fuc(glycans) n_arm_fuc(glycans) n_gal(glycans) n_terminal_gal(glycans) n_sia(glycans) n_man(glycans)n_glycan_type(glycans) has_bisecting(glycans) n_antennae(glycans) n_fuc(glycans) n_core_fuc(glycans) n_arm_fuc(glycans) n_gal(glycans) n_terminal_gal(glycans) n_sia(glycans) n_man(glycans)
glycans |
A |
n_glycan_type(): A factor vector indicating the N-glycan type,
either "highmannose", "hybrid", "complex", or "paucimannose".
has_bisecting(): A logical vector indicating if the glycan has a bisecting GlcNAc.
n_antennae(): An integer vector indicating the number of antennae.
n_fuc(): An integer vector indicating the number of fucoses.
n_core_fuc(): An integer vector indicating the number of core fucoses.
n_arm_fuc(): An integer vector indicating the number of arm fucoses.
n_gal(): An integer vector indicating the number of galactoses.
n_terminal_gal(): An integer vector indicating the number of terminal galactoses.
n_sia(): An integer vector indicating the number of sialic acids.
n_man(): An integer vector indicating the number of mannoses.
n_glycan_type(): N-Glycan TypesFour types of N-glycans are recognized: high mannose, hybrid, complex, and paucimannose. For more information about N-glycan types, see Essentials of Glycobiology.
has_bisecting(): Bisecting GlcNAcBisecting GlcNAc is a GlcNAc residue attached to the core mannose of N-glycans.
Man
\
GlcNAc - Man - GlcNAc - GlcNAc -
~~~~~~ /
Man
n_antennae(): Number of AntennaeThe number of antennae is the number of branching GlcNAc to the core mannoses.
n_fuc(): Number of FucosesNumber of fucoses. This function assumes the fucose is a dHex.
n_core_fuc(): Number of Core FucosesCore fucoses are those fucose residues attached to the core GlcNAc of an N-glycan.
Man Fuc <- core fucose
\ |
Man - GlcNAc - GlcNAc -
/
Man
n_arm_fuc(): Number of Arm FucosesArm focuses are those focuse residues attached to the branching GlcNAc of an N-glycan.
Fuc <- arm fucose
|
GlcNAc - Man
\
Man - GlcNAc - GlcNAc -
/
GlcNAc - Man
n_gal(): Number of GalactosesThis function seems useless and silly. It is, if you have a well-structured glycan with concrete monosaccharides. However, if you only have "Hex" or "H" at hand, it is tricky to know how many of them are "Gal" and how many are "Man". This function makes a simply assumption that all the rightmost "H" in a "H-H-N-H" unit is a galactose. The two "H" on the left are mannoses of the N-glycan core. The "N" is a GlcNAc attached to one core mannose.
n_terminal_gal(): Number of Terminal GalactosesTerminal galactoses are those galactose residues on the non-reducing end without sialic acid capping.
Gal - GlcNAc - Man
~~~ \
terminal Gal Man - GlcNAc - GlcNAc -
/
Neu5Ac - Gal - GlcNAc - Man
~~~
not terminal Gal
n_sia(): Number of Sialic AcidsNumber of sialic acids (Neu5Ac). Neu5Gc is not counted.
n_man(): Number of MannosesNumber of mannoses. This function assumes the Hex of the N-glycan core is mannoses. Also, for high-mannose and paucimannose glycans, all Hex are mannoses. Finally, for hybrid glycans, all the rightmost (the side without branching HexNAc) are mannoses.
A proportion trait is the proportion of certain group of glycans within a larger group of glycans.
For example, the proportion of sialylated glycans within all glycans,
or the proportion of tetra-antennary glycans within all complex glycans.
This type of traits is the most common type of glycan derived traits.
It can be regarded as a special case of the ratio trait (see ratio()).
prop(cond, within = NULL, na_action = "keep")prop(cond, within = NULL, na_action = "keep")
cond |
Condition to use for defining the smaller group.
An expression that evaluates to a logical vector.
The names of all built-in meta-properties (see |
within |
Condition to use for defining the larger group,
with the same format as |
na_action |
How to handle missing values.
|
A derived trait function.
You can use prop() to create proportion trait easily.
For example:
# Proportion of core-fucosylated glycans within all glycans prop(nFc > 0) # Proportion of complex glycans within all glycans prop(Tp == "complex") # Proportion of sialylated and fucosylated glycans within all glycans prop(nS > 0 & nFc > 0)
Note that the last example uses & for logical AND.
Actually, you can use any logical operator in the expression in R (e.g., |, !, etc.).
If you want to perform a pre-filtering before calculating the proportion,
for example, you want to calculate the proportion of core-fucosylated glycans within only complex glycans,
you can use within to define the denominator.
# Proportion of core-fucosylated glycans within complex glycans prop(nFc > 0, within = (Tp == "complex")) # Proportion of core-fucosylated glycans with tetra-antenary complex glycans prop(nFc > 0, within = (Tp == "complex" & nA == 4))
The parentheses around the condition in within are optional,
but it is recommended to use them for clarity.
All the internal summation operations ignore NAs by default. Therefore, NAs in the expression matrix and meta-property values will not result in NAs in the derived traits. However, as all derived traits calculate a ratio of two values, NAs will be introduced when:
The denominator is 0. This can happen when the within condition selects no glycans.
Both the numerator and denominator are 0.
# Proportion of core-fucosylated glycans within all glycans prop(nFc > 0) # Proportion of bisecting glycans within all glycans prop(B) # Proportion of sialylated and arm-fucosylated glycans within all glycans prop(nS > 0 & nFa > 0) # Proportion of bi-antennary glycans within complex glycans prop(nA == 2, within = (Tp == "complex")) # Proportion of sialylated glycans within core-fucosylated tetra-antennary glycans prop(nS > 0, within = (nFc > 0 & nA == 4))# Proportion of core-fucosylated glycans within all glycans prop(nFc > 0) # Proportion of bisecting glycans within all glycans prop(B) # Proportion of sialylated and arm-fucosylated glycans within all glycans prop(nS > 0 & nFa > 0) # Proportion of bi-antennary glycans within complex glycans prop(nA == 2, within = (Tp == "complex")) # Proportion of sialylated glycans within core-fucosylated tetra-antennary glycans prop(nS > 0, within = (nFc > 0 & nA == 4))
This function quantifies motifs from glycomic or glycoproteomic profiles. For glycomics data, it calculates motif quantifications directly. For glycoproteomics data, each glycosite is treated as a separate glycome, and motif quantifications are calculated in a site-specific manner.
The function takes a glyexp::experiment() object and returns a new glyexp::experiment()
object with motif quantifications. Instead of containing quantifications of individual glycans
on each glycosite in each sample, the new experiment contains quantifications
of each motif on each glycosite in each sample (for glycoproteomics data) or
motif quantifications in each sample (for glycomics data).
quantify_motifs( exp, motifs, method = "relative", alignments = NULL, ignore_linkages = FALSE )quantify_motifs( exp, motifs, method = "relative", alignments = NULL, ignore_linkages = FALSE )
exp |
A |
motifs |
A character vector of motif names, glycan structure strings,
a 'glyrepr_structure' object, or a motif specification from |
method |
A character string specifying the quantification method. Must be either "absolute" or "relative". Default is "relative". See "Relative and Absolute Motif Quantification" section for details. |
alignments |
A character vector specifying the alignment method for each motif.
Can be "terminal", "substructure", "core", "whole" or "exact". Default is "substructure".
See |
ignore_linkages |
A logical value. If |
A new glyexp::experiment() object containing motif quantifications.
Instead of containing quantifications of individual glycans on each glycosite in each sample,
the new experiment contains quantifications of each motif on each glycosite in each sample
(for glycoproteomics data) or motif quantifications in each sample (for glycomics data).
The var_info table includes a motif_structure column containing the parsed glycan structure
for each motif, allowing traceability of motif definitions.
For glycoproteomics data, with additional columns:
protein: protein ID
protein_site: the glycosite position on the protein
Other columns in the var_info table (e.g., gene) are retained if they have a "many-to-one"
relationship with glycosites (unique combinations of protein, protein_site).
That is, each glycosite cannot have multiple values for these columns.
gene is a common example, as a glycosite can only be associated with one gene.
Glycan descriptions are not such columns, as a glycosite can have multiple glycans,
thus having multiple descriptions.
Columns that do not have this relationship with glycosites will be dropped.
Don't worry if you cannot understand this logic—
just know that this function will do its best to preserve useful information.
The sample_info and meta_data tables are not modified,
except that the exp_type field in meta_data is set to "traitomics" for glycomics data
and "traitproteomics" for glycoproteomics data.
Motif quantification can be performed in two ways: absolute and relative. Do not confuse this with the absolute and relative quantification of glycans or glycopeptides.
In an omics context, absolute quantification means we can determine the actual concentration (e.g., in mg/L) or counts (e.g., mRNA copy numbers), while relative quantification means we can only compare the abundance of the same molecule between different samples, but the absolute values themselves are not meaningful. Label-free quantification is a relative quantification method.
In the context of motif quantification, absolute and relative refer to whether we normalize the motif quantifications by the total abundance of the glycome (or the mini-glycomes of individual glycosites).
For example, let's say we have a glycomics dataset with two samples, A and B. In each sample, three glycans (G1, G2, G3) are present, with the following abundances:
Sample A: G1 = 10, G2 = 20, G3 = 30
Sample B: G1 = 20, G2 = 40, G3 = 60
For simplicity, let's say the motif we want to quantify happens to appear once in all glycans.
For absolute motif quantification, we simply sum the abundances of the motif across all glycans:
Sample A: 10 × 1 + 20 × 1 + 30 × 1 = 60 (× 1 because the motif appears once in each glycan)
Sample B: 20 × 1 + 40 × 1 + 60 × 1 = 120
The results are clearly different.
However, if we quantify the motif in a relative way:
Sample A: (10 × 1 + 20 × 1 + 30 × 1) / (10 + 20 + 30) = 60 / 60 = 1
Sample B: (20 × 1 + 40 × 1 + 60 × 1) / (20 + 40 + 60) = 120 / 120 = 1
The results are identical!
The absolute motif quantification answers the question "how many motifs are there in the sample?", while the relative motif quantification answers the question "if I take out one glycan molecule, how many motifs are on it in average?"
So which method should you use? It depends on both your data type and research objectives. Here are some general guidelines:
For glycomics data, you should typically use relative motif quantification, because glycomics data is inherently compositional (search "compositional data" for more details).
For glycoproteomics data, the choice depends on your research question. If you want to compare observed motif abundance across samples, use absolute motif quantification. Note that many factors can cause one sample to have higher values than another, such as upregulation of enzymes responsible for the motif, or higher overall glycosylation site occupancy. If you want to understand the underlying regulatory mechanisms, use relative motif quantification to correct for differences in site occupancy.
Motif quantification is a special type of derived trait.
It is simply a weighted sum (wsum()) or weighted mean (wmean()) of the motif counts,
for absolute and relative motif quantification, respectively.
You can perform motif quantification manually using derive_traits().
Let's perform absolute motif quantification manually using derive_traits()
to better understand the process (we will use custom column meta-properties here):
# Add the meta-properties to the variable information tibble motifs <- c( nLx = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x antigen nSLx = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x antigen ) exp_with_mps <- glymotif::add_motifs_int(exp, motifs) # Define the traits trait_fns <- list(Lx = wsum(nLx), SLx = wsum(nSLx)) # Calculate the traits derive_traits(exp_with_mps, trait_fns = trait_fns)
The code snippet above is equivalent to:
# Quantify the motifs quantify_motifs(exp, motifs, method = "absolute")
In fact, this is essentially the implementation of the quantify_motifs() function,
except the actual implementation is more robust and user-friendly.
For relative motif quantification, simply replace wsum() with wmean(),
and everything else remains the same.
derive_traits(), glymotif::have_motifs()
library(glyexp) library(glyclean) exp <- real_experiment |> auto_clean() |> slice_head_var(n = 10) motifs <- c( nLx = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x antigen nSLx = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x antigen ) quantify_motifs(exp, motifs) # Using dynamic motifs (auto-extracted from data) quantify_motifs(exp, glymotif::dynamic_motifs(max_size = 3)) # Using branch motifs (auto-extracted from data) quantify_motifs(exp, glymotif::branch_motifs())library(glyexp) library(glyclean) exp <- real_experiment |> auto_clean() |> slice_head_var(n = 10) motifs <- c( nLx = "Hex(??-?)[dHex(??-?)]HexNAc(??-", # Lewis x antigen nSLx = "NeuAc(??-?)Hex(??-?)[dHex(??-?)]HexNAc(??-" # Sialyl Lewis x antigen ) quantify_motifs(exp, motifs) # Using dynamic motifs (auto-extracted from data) quantify_motifs(exp, glymotif::dynamic_motifs(max_size = 3)) # Using branch motifs (auto-extracted from data) quantify_motifs(exp, glymotif::branch_motifs())
A ratio trait is the ratio of total abundance of two groups of glycans. For example, the ratio of complex glycans and hybrid glycans, or the ratio of bisecting and unbisecting glycans.
ratio(num_cond, denom_cond, within = NULL, na_action = "keep")ratio(num_cond, denom_cond, within = NULL, na_action = "keep")
num_cond |
Condition to use for defining the numerator.
An expression that evaluates to a logical vector.
The names of all built-in meta-properties (see |
denom_cond |
Condition to use for defining the denominator. Same format as |
within |
Condition to set a restriction for the glycans. Same format as |
na_action |
How to handle missing values.
|
A derived trait function.
You can use ratio() to create ratio trait easily.
For example:
# Ratio of complex glycans and hybrid glycans ratio(Tp == "complex", Tp == "hybrid") # Ratio of bisecting and unbisecting glycans ratio(B, !B) # Ratio of core-fucosylated and non-core-fucosylated glycans within complex glycans ratio(nFc > 0 & Tp == "complex", nFc == 0 & Tp == "complex") # The above example can be simplified as: ratio(nFc > 0, nFc == 0, within = (Tp == "complex")) # more readable
Note that the last example uses & for logical AND.
Actually, you can use any logical operator in the expression in R (e.g., |, !, etc.).
prop() is a special case of ratio(),
i.e., prop(cond, within) is equivalent to ratio(cond & within, within).
We recommend using prop() instead of ratio() for clarity if possible.
All the internal summation operations ignore NAs by default. Therefore, NAs in the expression matrix and meta-property values will not result in NAs in the derived traits. However, as all derived traits calculate a ratio of two values, NAs will be introduced when:
The denominator is 0. This can happen when the within condition selects no glycans.
Both the numerator and denominator are 0.
# Ratio of complex glycans and hybrid glycans ratio(Tp == "complex", Tp == "hybrid") # Ratio of bisecting and unbisecting glycans within bi-antennary glycans ratio(B, !B, within = (nA == 2))# Ratio of complex glycans and hybrid glycans ratio(Tp == "complex", Tp == "hybrid") # Ratio of bisecting and unbisecting glycans within bi-antennary glycans ratio(B, !B, within = (nA == 2))
A total abundance trait is the total abundance of a group of glycans. For example, the total abundance of all complex glycans, or the total abundance of all tetra-antennary glycans.
total(cond)total(cond)
cond |
Condition to use for defining the group of glycans.
An expression that evaluates to a logical vector.
The names of all built-in meta-properties (see |
A derived trait function.
You can use total() to create total abundance trait easily.
For example:
# Total abundance of all complex glycans total(Tp == "complex") # Total abundance of all tetra-antennary glycans total(nA == 4)
# Total abundance of all complex glycans total(Tp == "complex") # Total abundance of all tetra-antennary glycans total(nA == 4)# Total abundance of all complex glycans total(Tp == "complex") # Total abundance of all tetra-antennary glycans total(nA == 4)
These derived traits are the most basic and commonly used derived traits. They describe global properties of a glycome including the type of glycans, fucosylation level, sialylation level, galactosylation level, and branching level.
traits_basic(sia_link = FALSE) basic_traits(sia_link = FALSE)traits_basic(sia_link = FALSE) basic_traits(sia_link = FALSE)
sia_link |
A boolean indicating whether to include sialic acid linkage traits. Default is |
A named list of derived traits.
The explanations of the derived traits are as follows:
TM: Proportion of highmannose glycans
TH: Proportion of hybrid glycans
TC: Proportion of complex glycans
MM: Average number of mannoses within highmannose glycans
CA2: Proportion of bi-antennary glycans within complex glycans
CA3: Proportion of tri-antennary glycans within complex glycans
CA4: Proportion of tetra-antennary glycans within complex glycans
TF: Proportion of fucosylated glycans
TFc: Proportion of core-fucosylated glycans
TFa: Proportion of arm-fucosylated glycans
TB: Proportion of glycans with bisecting GlcNAc
GS: Average degree of sialylation per galactose
AG: Average degree of galactosylation per antenna
TS: Proportion of sialylated glycans
Four additional sialic acid linkage traits are included if sia_link = TRUE.
GE: Average degree of a2,6-linked sialylation per galactose
GL: Average degree of a2,3-linked sialylation per galactose
TE: Proportion of a2,6-linked sialylated glycans
TL: Proportion of a2,3-linked sialylated glycans
To use these sialic acid linkage traits,
var_info of the input glyexp::experiment() must have the following columns:
nE: Number of a2,6-linked sialic acids
nL: Number of a2,3-linked sialic acids
Note that you have to add these two columns even if the glycan_structure column has intact linkages.
This is because by convention all traits work with glycan structures with "basic" structure levels
(i.e., with generic monosaccharides like "Hex" and "HexNAc" and no linkages specified).
traits_basic()traits_basic()
These traits are the ones used by Clerc et al. 2018 (https://doi.org/10.1053/j.gastro.2018.05.030). We generally don't recommend using these traits because they are either redundant or missing some important traits. We include this function because this paper is very influential in the field.
traits_clerc_2018(sia_link = FALSE)traits_clerc_2018(sia_link = FALSE)
sia_link |
A boolean indicating whether to include sialic acid linkage traits. Default is |
A named list of derived traits.
To use these sialic acid linkage traits,
var_info of the input glyexp::experiment() must have the following columns:
nE: Number of a2,6-linked sialic acids
nL: Number of a2,3-linked sialic acids
Note that you have to add these two columns even if the glycan_structure column has intact linkages.
This is because by convention all traits work with glycan structures with "basic" structure levels
(i.e., with generic monosaccharides like "Hex" and "HexNAc" and no linkages specified).
traits_clerc_2018()[1:5]traits_clerc_2018()[1:5]
This function returns a named list of detailed derived traits.
Compared to traits_basic(), this function includes derived traits with more
detailed within conditions.
traits_detailed(sia_link = FALSE) all_traits(sia_link = FALSE)traits_detailed(sia_link = FALSE) all_traits(sia_link = FALSE)
sia_link |
A boolean indicating whether to include sialic acid linkage traits. Default is |
The explanations of the derived traits are as follows:
A1F: Proportion of fucosylated glycans within mono-antennary glycans
A2F: Proportion of fucosylated glycans within bi-antennary glycans
A3F: Proportion of fucosylated glycans within tri-antennary glycans
A4F: Proportion of fucosylated glycans within tetra-antennary glycans
A1Fc: Proportion of core-fucosylated glycans within mono-antennary glycans
A2Fc: Proportion of core-fucosylated glycans within bi-antennary glycans
A3Fc: Proportion of core-fucosylated glycans within tri-antennary glycans
A4Fc: Proportion of core-fucosylated glycans within tetra-antennary glycans
A1Fa: Proportion of arm-fucosylated glycans within mono-antennary glycans
A2Fa: Proportion of arm-fucosylated glycans within bi-antennary glycans
A3Fa: Proportion of arm-fucosylated glycans within tri-antennary glycans
A4Fa: Proportion of arm-fucosylated glycans within tetra-antennary glycans
A1SFa: Proportion of arm-fucosylated glycans within sialylated mono-antennary glycans
A2SFa: Proportion of arm-fucosylated glycans within sialylated bi-antennary glycans
A3SFa: Proportion of arm-fucosylated glycans within sialylated tri-antennary glycans
A4SFa: Proportion of arm-fucosylated glycans within sialylated tetra-antennary glycans
A1S0Fa: Proportion of arm-fucosylated glycans within asialylated mono-antennary glycans
A2S0Fa: Proportion of arm-fucosylated glycans within asialylated bi-antennary glycans
A3S0Fa: Proportion of arm-fucosylated glycans within asialylated tri-antennary glycans
A4S0Fa: Proportion of arm-fucosylated glycans within asialylated tetra-antennary glycans
A1B: Proportion of bisecting glycans within mono-antennary glycans
A2B: Proportion of bisecting glycans within bi-antennary glycans
A3B: Proportion of bisecting glycans within tri-antennary glycans
A4B: Proportion of bisecting glycans within tetra-antennary glycans
A1FcB: Proportion of bisecting glycans within core-fucosylated mono-antennary glycans
A2FcB: Proportion of bisecting glycans within core-fucosylated bi-antennary glycans
A3FcB: Proportion of bisecting glycans within core-fucosylated tri-antennary glycans
A4FcB: Proportion of bisecting glycans within core-fucosylated tetra-antennary glycans
A1Fc0B: Proportion of bisecting glycans within a-core-fucosylated mono-antennary glycans
A2Fc0B: Proportion of bisecting glycans within a-core-fucosylated bi-antennary glycans
A3Fc0B: Proportion of bisecting glycans within a-core-fucosylated tri-antennary glycans
A4Fc0B: Proportion of bisecting glycans within a-core-fucosylated tetra-antennary glycans
A1G: Average degree of galactosylation per antenna within mono-antennary glycans
A2G: Average degree of galactosylation per antenna within bi-antennary glycans
A3G: Average degree of galactosylation per antenna within tri-antennary glycans
A4G: Average degree of galactosylation per antenna within tetra-antennary glycans
A1Gt: Average degree of terminal galactose per antenna within mono-antennary glycans
A2Gt: Average degree of terminal galactose per antenna within bi-antennary glycans
A3Gt: Average degree of terminal galactose per antenna within tri-antennary glycans
A4Gt: Average degree of terminal galactose per antenna within tetra-antennary glycans
A1S: Average degree of sialylation per antenna within mono-antennary glycans
A2S: Average degree of sialylation per antenna within bi-antennary glycans
A3S: Average degree of sialylation per antenna within tri-antennary glycans
A4S: Average degree of sialylation per antenna within tetra-antennary glycans
A1GS: Average degree of sialylation per galactose within mono-antennary glycans
A2GS: Average degree of sialylation per galactose within bi-antennary glycans
A3GS: Average degree of sialylation per galactose within tri-antennary glycans
A4GS: Average degree of sialylation per galactose within tetra-antennary glycans
These additional sialic acid linkage traits are included if sia_link = TRUE.
A1E: Average degree of a2,6-linked sialylation per antenna within mono-antennary glycans
A2E: Average degree of a2,6-linked sialylation per antenna within bi-antennary glycans
A3E: Average degree of a2,6-linked sialylation per antenna within tri-antennary glycans
A4E: Average degree of a2,6-linked sialylation per antenna within tetra-antennary glycans
A1L: Average degree of a2,3-linked sialylation per antenna within mono-antennary glycans
A2L: Average degree of a2,3-linked sialylation per antenna within bi-antennary glycans
A3L: Average degree of a2,3-linked sialylation per antenna within tri-antennary glycans
A4L: Average degree of a2,3-linked sialylation per antenna within tetra-antennary glycans
A1GE: Average degree of a2,6-linked sialylation per galactose within mono-antennary glycans
A2GE: Average degree of a2,6-linked sialylation per galactose within bi-antennary glycans
A3GE: Average degree of a2,6-linked sialylation per galactose within tri-antennary glycans
A4GE: Average degree of a2,6-linked sialylation per galactose within tetra-antennary glycans
A1GL: Average degree of a2,3-linked sialylation per galactose within mono-antennary glycans
A2GL: Average degree of a2,3-linked sialylation per galactose within bi-antennary glycans
A3GL: Average degree of a2,3-linked sialylation per galactose within tri-antennary glycans
A4GL: Average degree of a2,3-linked sialylation per galactose within tetra-antennary glycans
A named list of derived traits.
To use these sialic acid linkage traits,
var_info of the input glyexp::experiment() must have the following columns:
nE: Number of a2,6-linked sialic acids
nL: Number of a2,3-linked sialic acids
Note that you have to add these two columns even if the glycan_structure column has intact linkages.
This is because by convention all traits work with glycan structures with "basic" structure levels
(i.e., with generic monosaccharides like "Hex" and "HexNAc" and no linkages specified).
traits_detailed()traits_detailed()
These traits are the ones used by Fu et al. 2026 (https://doi.org/10.1038/s41467-026-68579-x).
It is much like traits_detailed(), but doesn't differentiate core- and arm-fucosylation,
and doesn't include the sialic acid linkage traits.
Also, many traits are too specific to be useful and interpretable.
traits_fu_2026()traits_fu_2026()
A named list of derived traits.
traits_fu_2026()[1:5]traits_fu_2026()[1:5]
These traits are the ones used by Li et al. 2025 (https://doi.org/10.1038/s41467-025-57633-9). These traits were created based only on glycan compositions, not structures. Here the traits are remapped to use glycan structures, so they are not exactly the same as the ones in the paper. Generally we don't recommend using these traits because they capture too little information. We include this function because this paper is very influential in the field.
traits_li_2025()traits_li_2025()
A named list of derived traits.
traits_li_2025()[1:5]traits_li_2025()[1:5]
A weighted-mean trait is the average value of some quantitative property within a group of glycans, weighted by the abundance of the glycans. For example, the average number of antennae within all complex glycans, or the average number of sialic acids within all glycans.
wmean(val, within = NULL, na_action = "keep")wmean(val, within = NULL, na_action = "keep")
val |
Expression to use for defining the value.
An expression that evaluates to a numeric vector.
The names of all built-in meta-properties (see |
within |
Condition to set a restriction for the glycans. Same format as |
na_action |
How to handle missing values.
|
A derived trait function.
You can use wmean() to create weighted-mean trait easily.
For example:
# Weighted mean of the number of sialic acids within all glycans wmean(nS) # Average degree of sialylation per antenna within all glycans wmean(nS / nA)
Note that the last example uses / for division.
Actually, you can use any arithmetic operator in the expression in R (e.g., *, +, -, etc.).
If you want to perform a pre-filtering before calculating the weighted-mean,
for example, you want to calculate the average degree of sialylation per antenna within only complex glycans,
you can use within to define the restriction.
# Average number of antennae within complex glycans wmean(nA, within = (Tp == "complex"))
All the internal summation operations ignore NAs by default. Therefore, NAs in the expression matrix and meta-property values will not result in NAs in the derived traits. However, as all derived traits calculate a ratio of two values, NAs will be introduced when:
The denominator is 0. This can happen when the within condition selects no glycans.
Both the numerator and denominator are 0.
# Weighted mean of the number of sialic acids within all glycans wmean(nS) # Average degree of sialylation per antenna within all glycans wmean(nS / nA) # Average number of antennae within complex glycans wmean(nA, within = (Tp == "complex"))# Weighted mean of the number of sialic acids within all glycans wmean(nS) # Average degree of sialylation per antenna within all glycans wmean(nS / nA) # Average number of antennae within complex glycans wmean(nA, within = (Tp == "complex"))
A weighted sum trait is the sum of a quantitative property within a group of glycans, weighted by the abundance of the glycans. For example, the sum of the number of sialic acids within all glycans, or the sum of the number of Lewis x antigens within all glycans.
wsum(val, within = NULL)wsum(val, within = NULL)
val |
Expression to use for defining the value.
An expression that evaluates to a logical vector.
The names of all built-in meta-properties (see |
within |
Condition to set a restriction for the glycans. Same format as |
A derived trait function.
You can use wsum() to create weighted sum trait easily.
For example:
# Weighted sum of the number of sialic acids within all glycans wsum(nS)
This can be regarded as the quantification of sialic acids. If some glycan has only one sialic acid, its abundance is added to the results. If another glycan has two sialic acids, its abundance is doubled before being added to the results.
You can also use within to restrict the weighted sum calculation to specific glycan subsets.
For example, you can calculate the weighted sum of the number of sialic acids within complex glycans:
wsum(nS, within = (Tp == "complex"))
# Weighted sum of the number of sialic acids within all glycans wsum(nS) # Weighted sum of the number of sialic acids within complex glycans wsum(nS, within = (Tp == "complex"))# Weighted sum of the number of sialic acids within all glycans wsum(nS) # Weighted sum of the number of sialic acids within complex glycans wsum(nS, within = (Tp == "complex"))