| Title: | Deconstruction Glycosylation Biosynthesis |
|---|---|
| Description: | Simulates the biosynthetic process of glycosylation in silico, enabling computational analysis of glycan biosynthesis pathways. Key features include: identifying specific glycosyltransferases responsible for each glycosidic bond at the bond level; reconstructing complete biosynthetic pathways for given glycans; exploring possible transformations between two glycan structures via glycosyltransferases and glycosidases; and predicting glycans that can be synthesized from substrate structures and enzyme sets. The package supports multiple glycan types including N-glycans, O-GalNAc, O-Man, O-GlcNAc, O-Fuc, and O-Glc. As a downstream component of the glycoverse ecosystem, it integrates seamlessly with 'glyrepr', 'glyparse', and 'glymotif' to provide practical, analysis-ready insights into glycan biosynthesis for omics researchers. |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.3 |
| Built: | 2026-06-09 07:51:12 UTC |
| Source: | https://github.com/glycoverse/glyenzy |
This function simulates the action of an enzyme on a glycan. It returns all possible products generated by the enzyme with the given glycans.
apply_enzyme(glycans, enzyme, return_list = NULL)apply_enzyme(glycans, enzyme, return_list = NULL)
glycans |
A |
enzyme |
An |
return_list |
If |
A glyrepr::glycan_structure() vector, or a list of such vectors.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-") apply_enzyme(glycan, enzyme("MGAT3")) # Or use characters directly apply_enzyme("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "MGAT3") # Vectorized input glycans <- c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" ) apply_enzyme(glycans, "MGAT3")library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-") apply_enzyme(glycan, enzyme("MGAT3")) # Or use characters directly apply_enzyme("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "MGAT3") # Vectorized input glycans <- c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" ) apply_enzyme(glycans, "MGAT3")
Count how many times an enzyme is involved in the biosynthesis of a glycan.
count_enzyme(glycans, enzyme)count_enzyme(glycans, enzyme)
glycans |
A |
enzyme |
An |
An integer vector of the same length as glycans.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-") count_enzyme(glycan, enzyme("ST6GAL1")) # Or use characters directly count_enzyme("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "ST6GAL1") # Vectorized input glycans <- c( "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "Gal(b1-4)GlcNAc(b1-" ) count_enzyme(glycans, "ST6GAL1")library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-") count_enzyme(glycan, enzyme("ST6GAL1")) # Or use characters directly count_enzyme("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "ST6GAL1") # Vectorized input glycans <- c( "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "Gal(b1-4)GlcNAc(b1-" ) count_enzyme(glycans, "ST6GAL1")
Return a named list of all built-in enzymes,
or a character vector of gene symbols if return_str is TRUE.
db_enzymes( return_str = FALSE, include_starter_gt = TRUE, include_npre_gt = TRUE )db_enzymes( return_str = FALSE, include_starter_gt = TRUE, include_npre_gt = TRUE )
return_str |
If |
include_starter_gt |
If |
include_npre_gt |
If |
A list of enzyme()s or a character vector.
db_enzymes(return_str = TRUE)db_enzymes(return_str = TRUE)
Two types of enzymes are involved in glycosylation: glycosyltransferases (GTs) and glycoside hydrolases (GHs).
GTs catalyze the transfer of a sugar residue from a donor to an acceptor, thus building up glycan structures.
GHs catalyze the removal of a sugar residue from a substrate, thus breaking down glycan structures.
One special subtype of GTs are starter GTs (or initiating GTs), which catalyze the addition of the first sugar residue onto a non-glycan substrate, thus initiating glycosylation.
Use enzyme() with a gene symbol to load a predefined enzyme.
For example, use enzyme("ST3GAL3") to load the enzyme ST3GAL3.
Throughout the package, you can use enzyme()s for any enzyme argument,
or just use the gene symbol directly.
For example, involve("Neu5Ac(a2-3)Gal(b1-3)GalNAc(a1-", "ST3GAL3") and
involve("Neu5Ac(a2-3)Gal(b1-3)GalNAc(a1-", enzyme("ST3GAL3")) are equivalent.
enzyme(symbol)enzyme(symbol)
symbol |
The gene symbol of the enzyme. |
A glyenzy_enzyme object.
glyenzy_enzyme
An enzyme() is a list with the following elements:
name: the name of the enzyme, usually the gene symbol.
rules: a list of glyenzy_enzyme_rule objects.
Each rule is a list with the following fields:
acceptor: the motif that the enzyme recognizes
acceptor_alignment: the alignment of the acceptor
rejects: the motifs that the enzyme rejects to act on
product: the product generated by the enzyme
acceptor_idx: the node index of the acceptor where the enzyme acts on.
For GTs, this is the node new residue is attached to.
For GHs, this is the node that is removed.
For starter GTs, acceptor_idx is 0.
product_idx: the node index of the product residue in the product structure.
For GTs, this is the index of the newly added residue in the product.
For GHs, this is NULL (no new residue is added).
new_residue: the new residue added by the enzyme. For GHs, this is NULL.
new_linkage: the linkage of the new residue. For GHs, this is NULL.
type: the broad type of the enzyme, "GT" for glycosyltransferase or
"GH" for glycoside hydrolase. Starter GTs are encoded as type = "GT".
species: the species of the enzyme, e.g. "human" or "mouse".
You can see all these information by printing the enzyme object.
library(glyrepr) enzyme("ST3GAL3")library(glyrepr) enzyme("ST3GAL3")
This function returns all possible isoenzymes associated with the biosynthetic steps of the input glycan. Note that this function ignores the residues in glycans that cannot be matched to any enzyme rules.
find_enzyme(glycans, return_list = NULL)find_enzyme(glycans, return_list = NULL)
glycans |
A |
return_list |
If |
A character vector or a list of character vectors (see return_list parameter),
each containing the names of enzymes involved in the biosynthesis of the corresponding glycan.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
library(glyrepr) library(glyparse) # Use `glycan_structure()` glycans <- auto_parse(c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" )) find_enzyme(glycans) # Or use characters directly find_enzyme("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-")library(glyrepr) library(glyparse) # Use `glycan_structure()` glycans <- auto_parse(c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" )) find_enzyme(glycans) # Or use characters directly find_enzyme("GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-")
This function simulates the action of enzymes on glycans. Think of it like a primordial soup where you put in a few glycans and enzymes, and let them interact to generate new glycans.
grow_glycans_step() performs one round of enzyme action,
while grow_glycans() performs multiple rounds.
The only difference between grow_glycans_step() and grow_glycans(n_steps = 1)
is that the latter returns the original input glycans as well.
For both, a vector of unique glycan structures is returned.
The number of glycans generated by grow_glycans() is typically exponential,
and can quickly become very large.
Therefore, it is recommended to use a small number
of steps and carefully select the enzymes.
Also, you can use the filter argument to prune the results after each round.
grow_glycans_step(glycans, enzymes) grow_glycans(glycans, enzymes, n_steps = 5, filter = NULL)grow_glycans_step(glycans, enzymes) grow_glycans(glycans, enzymes, n_steps = 5, filter = NULL)
glycans |
A |
enzymes |
A character vector of gene symbols,
or a list of |
n_steps |
The maximum number of rounds to perform. The actual number of rounds may be less if no new glycans can be generated. |
filter |
A function to filter the generated glycans.
It should have a single |
A glyrepr::glycan_structure() vector of all unique glycans generated.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
# Use `grow_glycans_step()` to build glycans step by step glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" glycan |> grow_glycans_step("MGAT2") |> grow_glycans_step("B4GALT1") |> grow_glycans_step("ST3GAL3") # Use `grow_glycans()` to simulate a primordial soup glycans <- c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" ) enzymes <- c("B4GALT1", "ST3GAL3") grow_glycans(glycans, enzymes, n_steps = 5) # Use `filter` to prune the results after each round # Here we keep only glycans that are synthesized by MGAT2 grow_glycans(glycans, enzymes, n_steps = 5, filter = ~ have_enzyme(.x, "MGAT2"))# Use `grow_glycans_step()` to build glycans step by step glycan <- "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" glycan |> grow_glycans_step("MGAT2") |> grow_glycans_step("B4GALT1") |> grow_glycans_step("ST3GAL3") # Use `grow_glycans()` to simulate a primordial soup glycans <- c( "GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", "GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-" ) enzymes <- c("B4GALT1", "ST3GAL3") grow_glycans(glycans, enzymes, n_steps = 5) # Use `filter` to prune the results after each round # Here we keep only glycans that are synthesized by MGAT2 grow_glycans(glycans, enzymes, n_steps = 5, filter = ~ have_enzyme(.x, "MGAT2"))
Glycans are produced through a series of enzymatic reactions. This function checks whether a specific enzyme participates in the biosynthesis of a given glycan (or glycans).
have_enzyme(glycans, enzyme)have_enzyme(glycans, enzyme)
glycans |
A |
enzyme |
An |
A logical vector of the same length as glycans.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
The basic approach is straightforward: for each reaction rule
associated with the enzyme, the function checks whether the
corresponding product motif appears in the glycan.
If any rule matches, the function returns TRUE.
For N-glycans, additional logic is applied to handle special cases. Products of MGAT1 are often further trimmed by glycoside hydrolases, meaning that the final glycan product may no longer contain the original motif. In these cases, the function instead looks for specific motif markers to determine enzyme involvement.
library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-") have_enzyme(glycan, enzyme("ST6GAL1")) # Or use characters directly have_enzyme("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "ST6GAL1") # Vectorized input glycans <- c( "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "Gal(b1-4)GlcNAc(b1-" ) have_enzyme(glycans, "ST6GAL1")library(glyrepr) library(glyparse) # Use `glycan_structure()` and `enzyme()` glycan <- auto_parse("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-") have_enzyme(glycan, enzyme("ST6GAL1")) # Or use characters directly have_enzyme("Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "ST6GAL1") # Vectorized input glycans <- c( "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-", "Gal(b1-4)GlcNAc(b1-" ) have_enzyme(glycans, "ST6GAL1")
This function creates a custom enzyme object,
which can be used in all functions in this package that accept an enzyme argument.
Most of the time, you can use enzyme() with a gene symbol to get a predefined enzyme,
or just use the gene symbol directly in function arguments.
Use this function only when you need to create an enzyme with custom rules.
See the "Enzyme rules" section below for more details.
For now you can only create regular glycosyltransferases (GTs) and glycoside hydrolases (GHs) with this function. Special enzymes like starter GTs for glycosylation initiation are reserved for predefined enzymes.
make_enzyme(name, rules, type, species)make_enzyme(name, rules, type, species)
name |
The name of the enzyme. |
rules |
A list of enzyme rules, each rule being a list with the following fields:
|
type |
The type of the enzyme, "GT" for glycosyltransferase or "GH" for glycoside hydrolase. |
species |
The species of the enzyme. Default is "human". Currently no validation is done on the species. This field is only used for information purposes. |
A glyenzy_enzyme object.
The enzyme rules are the core of an enzyme object. Each rule defines one possible reaction of an enzyme, and an enzyme can have multiple rules.
We now explain each field of an enzyme rule in detail.
The acceptor is the glycan motif (substructure) that the enzyme recognizes.
It can be as simple as a single residue, like Man(a1-,
or a more complex motif, like Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-.
Note that the acceptor does not define where the enzyme acts on,
which is determined collectively by the acceptor and product fields.
Under the hood, the acceptor is detected by glymotif::match_motif().
To match a motif, we need to specify the alignment of the motif.
Possible values are "substructure", "core", "terminal" and "whole".
See glymotif::have_motif() for details.
Some enzymes are really picky.
They will reject to act on certain glycans if some motifs are present,
even if the acceptor is perfectly matched.
For example, if we want to define an enzyme that adds a Neu5Ac residue to the Gal residue
of a Gal(b1-4)GlcNAc(b1- motif,
but only when there is no a1-3 Fuc on that GlcNAc,
we can set "Gal(b1-4)GlcNAc(b1-" as the acceptor and "Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-" as the rejects.
You can provide multiple rejects as a vector.
For example, if we want to reject Gal(b1-4)GlcNAc(b1- with either
a1-3 Fuc on GlcNAc or a1-2 Fuc on Gal,
we can set c("Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-", "Fuc(a1-2)Gal(b1-4)GlcNAc(b1-") as the rejects.
Note that here are some restrictions on the rejects:
When acceptor_alignment is "whole", rejects cannot be set.
The acceptor must be the substructure of all rejects.
For example, in the a1-3 Fuc example above, you cannot just set "Fuc(a1-3)GlcNAc(b1-" as the rejects,
because "Gal(b1-4)GlcNAc(b1-" (the acceptor) is not the substructure of "Fuc(a1-3)GlcNAc(b1-" (the reject).
rejects cannot be the same structure as acceptor.
If rejects is NULL, it means the enzyme does not reject any motifs.
The product is what the acceptor is transformed into by the enzyme.
For glycosyltransferases (GTs), the product should have exactly one more residue than the acceptor.
For glycoside hydrolases (GHs), the product should have exactly one less residue than the acceptor.
If you have multiple possible products, put them in different rules.
Take the same example above, if we want to add a Neu5Ac residue to the Gal residue
of a Gal(b1-4)GlcNAc(b1- motif,
we can set "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" as the product.
library(glyrepr) # Create a custom enzyme that adds a Neu5Ac residue to the Gal residue make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" ) # Create a custom enzyme that removes a Gal residue from a Gal(b1-4)GlcNAc(b1-` motif make_enzyme( name = "MyGalH", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "GlcNAc(b1-" ) ), type = "GH", species = "human" ) # Create a custom enzyme with rejects make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = c( # Reject a1-3 Fuc on GlcNAc "Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-", # Reject a1-2 Fuc on Gal "Fuc(a1-2)Gal(b1-4)GlcNAc(b1-" ), product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" ) # Create a custom enzyme with more than one rule make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ), # same acceptor, different product list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Gc(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" )library(glyrepr) # Create a custom enzyme that adds a Neu5Ac residue to the Gal residue make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" ) # Create a custom enzyme that removes a Gal residue from a Gal(b1-4)GlcNAc(b1-` motif make_enzyme( name = "MyGalH", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "GlcNAc(b1-" ) ), type = "GH", species = "human" ) # Create a custom enzyme with rejects make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = c( # Reject a1-3 Fuc on GlcNAc "Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-", # Reject a1-2 Fuc on Gal "Fuc(a1-2)Gal(b1-4)GlcNAc(b1-" ), product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" ) # Create a custom enzyme with more than one rule make_enzyme( name = "MySiaT", rules = list( list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-" ), # same acceptor, different product list( acceptor = "Gal(b1-4)GlcNAc(b1-", acceptor_alignment = "terminal", rejects = NULL, product = "Neu5Gc(a2-3)Gal(b1-4)GlcNAc(b1-" ) ), type = "GT", species = "human" )
This function finds residues in glycans that match the product motifs of a glycosyltransferase and returns their node indices.
match_enzyme(glycans, enzyme)match_enzyme(glycans, enzyme)
glycans |
A |
enzyme |
A glycosyltransferase |
A list of integer vectors with the same length as glycans.
Each integer vector contains node indices for residues added by enzyme
in the corresponding glycan.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
glycan <- glyrepr::as_glycan_structure("Neu5Ac(a2-3)Gal(b1-3)GlcNAc(b1-") match_enzyme(glycan, "ST3GAL3")glycan <- glyrepr::as_glycan_structure("Neu5Ac(a2-3)Gal(b1-3)GlcNAc(b1-") match_enzyme(glycan, "ST3GAL3")
Find a synthesis path from one glycan structure to another using enzymatic reactions. This function uses breadth-first search to find the shortest path or all possible paths within a given number of steps.
path_biosynthesis(from, to, enzymes = NULL, max_steps = 10, filter = NULL)path_biosynthesis(from, to, enzymes = NULL, max_steps = 10, filter = NULL)
from |
A |
to |
A |
enzymes |
A character vector of gene symbols, or a list of |
max_steps |
Integer, maximum number of enzymatic steps to search. Default is 10. |
filter |
Optional function to filter generated glycans at each step.
Should take a |
An igraph::igraph() object representing the synthesis path(s).
Vertices represent glycan structures with name attribute containing
IUPAC-condensed strings. Edges represent enzymatic reactions with
enzyme attribute containing gene symbols and step attribute
indicating the step number.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
library(glyrepr) library(glyparse) # Find shortest path from <- "Gal(b1-4)GlcNAc(b1-" to <- "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-" path <- path_biosynthesis(from, to, enzymes = "ST6GAL1", max_steps = 3) # View the path igraph::as_data_frame(path, what = "edges")library(glyrepr) library(glyparse) # Find shortest path from <- "Gal(b1-4)GlcNAc(b1-" to <- "Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-" path <- path_biosynthesis(from, to, enzymes = "ST6GAL1", max_steps = 3) # View the path igraph::as_data_frame(path, what = "edges")
Print method for glyenzy_enzyme objects
## S3 method for class 'glyenzy_enzyme' print(x, ...)## S3 method for class 'glyenzy_enzyme' print(x, ...)
x |
A |
... |
Additional arguments passed to print methods. |
Reconstruct the biosynthetic pathway for one or more glycans using enzymatic reactions. This function uses a multi-target breadth-first search to find all feasible pathways that can synthesize all the target glycans.
trace_biosynthesis(glycans, enzymes = NULL, max_steps = 20, filter = NULL)trace_biosynthesis(glycans, enzymes = NULL, max_steps = 20, filter = NULL)
glycans |
A |
enzymes |
A character vector of gene symbols, or a list of |
max_steps |
Integer, maximum number of enzymatic steps to search. Default is 20. |
filter |
Optional function to filter generated glycans at each step.
Should take a |
An igraph::igraph() object representing the synthesis path(s).
Vertices represent glycan structures with name attribute containing
IUPAC-condensed strings. Edges represent enzymatic reactions with
enzyme attribute containing gene symbols and step attribute
indicating the step number. For multiple targets, the graph includes
all synthesis paths needed to reach every target glycan.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
library(glyrepr) library(glyparse) # Rebuild the biosynthetic pathway of a single glycan glycan <- "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-" path <- trace_biosynthesis(glycan, max_steps = 20) # Rebuild pathways for multiple glycans glycans <- c( "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-", "Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-" ) path <- trace_biosynthesis(glycans, max_steps = 20) # View the path igraph::as_data_frame(path, what = "edges")library(glyrepr) library(glyparse) # Rebuild the biosynthetic pathway of a single glycan glycan <- "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-" path <- trace_biosynthesis(glycan, max_steps = 20) # Rebuild pathways for multiple glycans glycans <- c( "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-", "Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc(a1-" ) path <- trace_biosynthesis(glycans, max_steps = 20) # View the path igraph::as_data_frame(path, what = "edges")
Visualize where an enzyme contributes residues to a glycan structure.
view_enzyme(glycan, enzyme)view_enzyme(glycan, enzyme)
glycan |
A |
enzyme |
A glycosyltransferase |
view_enzyme() matches one enzyme against one glycan with the same
matching rules used by match_enzyme(), then draws the glycan with the
matched residues highlighted.
A ggplot object returned by glydraw::draw_cartoon(). If no match is found,
the glycan is drawn without highlighted residues and a cli alert is emitted.
Here are some important notes for all functions in the glyenzy package.
All algorithms and enzyme information in glyenzy are applicable only to humans, and specifically to N-glycans and O-GalNAc glycans. Results may be inaccurate for other types of glycans (e.g., GAGs, glycolipids) or for glycans in other species (e.g., plants, insects).
The algorithm takes an intentionally inclusive approach, assuming that all possible isoenzymes capable of catalyzing a given reaction may be involved. Therefore, results should be interpreted with caution.
For example, in humans, detection of the motif "Neu5Ac(a2-3)Gal(b1-" will return both "ST3GAL3" and "ST3GAL4". In reality, only one of them might be active, depending on factors such as tissue specificity.
The function only works for glycans containing concrete residues
(e.g., "Glc", "GalNAc"), and not for glycans with generic
residues (e.g., "Hex", "HexNAc").
Substituents (e.g. sulfation, phosphorylation) are not supported yet,
and the algorithms might fail for glycans with substituents.
If your glycans contain substituents,
use glyrepr::remove_substituents() to get clean glycans.
If the glycan structure is incomplete or partially degraded, the result may be misleading.
For N-glycans, the starting structure is assumed to be "Glc(3)Man(9)GlcNAc(2)", the N-glycan precursor transferred to Asn by OST.
For O-GalNAc glycans, the starting structure is assumed to be "GalNAc(a1-".
For O-GlcNAc glycans, the starting structure is assumed to be "GlcNAc(b1-".
For O-Man glycans, the starting structure is assumed to be "Man(a1-".
For O-Fuc glycans, the starting structure is assumed to be "Fuc(a1-".
For O-Glc glycans, the starting structure is assumed to be "Glc(b1-".
match_enzyme(), glydraw::draw_cartoon()
glycan <- glyrepr::as_glycan_structure("Neu5Ac(a2-3)Gal(b1-3)GlcNAc(b1-") ## Not run: view_enzyme(glycan, "ST3GAL3") ## End(Not run)glycan <- glyrepr::as_glycan_structure("Neu5Ac(a2-3)Gal(b1-3)GlcNAc(b1-") ## Not run: view_enzyme(glycan, "ST3GAL3") ## End(Not run)