| Title: | Hierarchical Glycan Annotation |
|---|---|
| Description: | Hierarchical annotation of glycans based on molecule mass, composition, and structure. Starting from a molecule mass, glyanno calculates possible glycan compositions. Given a glycan composition, glyanno further deduces possible glycan structures. For glycan structures with generic monosaccharides (e.g., "Hex", "HexNAc"), glyanno refines them into specific types (e.g., "Glc", "Gal"). For structures lacking linkage information (e.g., "Gal(??-?)GalNAc(??-"), glyanno infers the most likely linkages (e.g., "Gal(b1-3)GalNAc(a1-"). |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.0 |
| Built: | 2026-06-03 08:45:29 UTC |
| Source: | https://github.com/glycoverse/glyanno |
This function calculates m/z values from glycans. Different adducts, mass types, and derivatives are supported. Custom mass dictionaries are also supported.
calculate_mz(glycans, charge = 1, adduct = "H+", mass_dict = NULL, safe = TRUE)calculate_mz(glycans, charge = 1, adduct = "H+", mass_dict = NULL, safe = TRUE)
glycans |
Glycans to calculate m/z values from. Valid inputs include:
|
charge |
Charge to use. Can be 0, 1, 2, 3, -1, -2, -3, etc. 0 means neutral. Default is 1. |
adduct |
Adduct to use. Can be "H+", "K+", "Na+", "NH4+", "H-", "Cl-", "HCO3-". Default is "H+".
|
mass_dict |
A named numeric vector of the mass of each monosaccharide residue.
Default is |
safe |
Whether to raise an error when unsupported monosaccharides or substituents are found.
|
A numeric vector of m/z values.
library(glyrepr) # Different input types calculate_mz(glycan_composition(c(Gal = 1, GalNAc = 1)), charge = 0) calculate_mz("Gal(1)GalNAc(1)", charge = 0) calculate_mz(as_glycan_structure("Gal(b1-3)GalNAc(a1-"), charge = 0) calculate_mz("Gal(b1-3)GalNAc(a1-", charge = 0) # For common situation in MALDI-TOF MS calculate_mz("Man(5)GlcNAc(2)", charge = 1,adduct = "Na+") # For common situation in ESI MS calculate_mz("Man(5)GlcNAc(2)", charge = 1, adduct = "H+") # Calculate permethylated m/z values calculate_mz("Man(5)GlcNAc(2)", charge = 0, mass_dict = glyanno_mass_dict(deriv = "permethyl")) # Calculate average mass calculate_mz("Man(5)GlcNAc(2)", charge = 0, mass_dict = glyanno_mass_dict(mass_type = "average")) # Vectorization calculate_mz(c("Man(5)GlcNAc(2)", "Gal(1)GalNAc(1)"), charge = 1, adduct = "Na+")library(glyrepr) # Different input types calculate_mz(glycan_composition(c(Gal = 1, GalNAc = 1)), charge = 0) calculate_mz("Gal(1)GalNAc(1)", charge = 0) calculate_mz(as_glycan_structure("Gal(b1-3)GalNAc(a1-"), charge = 0) calculate_mz("Gal(b1-3)GalNAc(a1-", charge = 0) # For common situation in MALDI-TOF MS calculate_mz("Man(5)GlcNAc(2)", charge = 1,adduct = "Na+") # For common situation in ESI MS calculate_mz("Man(5)GlcNAc(2)", charge = 1, adduct = "H+") # Calculate permethylated m/z values calculate_mz("Man(5)GlcNAc(2)", charge = 0, mass_dict = glyanno_mass_dict(deriv = "permethyl")) # Calculate average mass calculate_mz("Man(5)GlcNAc(2)", charge = 0, mass_dict = glyanno_mass_dict(mass_type = "average")) # Vectorization calculate_mz(c("Man(5)GlcNAc(2)", "Gal(1)GalNAc(1)"), charge = 1, adduct = "Na+")
Given glycan compositions, this function matches them to
all possible glycan structures in the glydb database.
comp_to_struc(comps, db = NULL, return_best = FALSE)comp_to_struc(comps, db = NULL, return_best = FALSE)
comps |
Glycan compositions to match against. Can be either:
|
db |
Glycan structures to match against.
Can be a |
return_best |
If |
If return_best=TRUE:
A glyrepr::glycan_structure() vector with the same length as comps.
Unmatched compositions are returned as NA.
If return_best=FALSE:
A tibble with the following columns:
composition: The glycan compositions, as glyrepr::glycan_composition() vector.
structure: The possible glycan structures, as glyrepr::glycan_structure() vector.
Note that one glycan composition can have multiple rows in the result,
corresponding to different possible glycan structures.
db
The db parameter is very important for all functions in this package.
By default, it uses all available glycans in the glydb package,
which is usually larger than what you need.
You can use helper functions in glydb to narrow down the database,
e.g. glydb::glydb_compositions() or glydb::glydb_structures().
You can use the species and glycan_type parameters to focus on specific species and glycan type.
For example, if you are only interested in N-glycan compositions in human,
you can use glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N").
Also, you can decide the level of information in the database by setting mono_type
of glydb::glydb_compositions() and structure_level of glydb::glydb_structures().
You can then pass the result to the db parameter of this function.
For example,
my_db <- glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N") mz_to_comp(mz, db = my_db)
comp_to_struc("H5N2")comp_to_struc("H5N2")
Given a generic glycan composition (e.g. Hex(5)HexNAc(2)), this function gives all possible concrete glycan compositions (e.g. Man(5)GlcNAc(2)).
enhance_comp(comps, db = NULL, return_best = FALSE)enhance_comp(comps, db = NULL, return_best = FALSE)
comps |
A |
db |
A |
return_best |
Logical. If |
If return_best=TRUE:
A glyrepr::glycan_composition() vector with the same length as comps.
Unmatched compositions are returned as NA.
If return_best=FALSE:
A tibble with the following columns:
raw: The original compositions.
enhanced: The enhanced compositions.
Note that one raw composition can have different enhanced compositions
as multiple rows in the result.
db
The db parameter is very important for all functions in this package.
By default, it uses all available glycans in the glydb package,
which is usually larger than what you need.
You can use helper functions in glydb to narrow down the database,
e.g. glydb::glydb_compositions() or glydb::glydb_structures().
You can use the species and glycan_type parameters to focus on specific species and glycan type.
For example, if you are only interested in N-glycan compositions in human,
you can use glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N").
Also, you can decide the level of information in the database by setting mono_type
of glydb::glydb_compositions() and structure_level of glydb::glydb_structures().
You can then pass the result to the db parameter of this function.
For example,
my_db <- glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N") mz_to_comp(mz, db = my_db)
enhance_comp("Hex(5)HexNAc(2)")enhance_comp("Hex(5)HexNAc(2)")
Given a glycan structure vector of any resolution level (see
glyrepr::get_structure_level() for details), this function gives all
possible glycan structures of higher resolution level.
enhance_struc(strucs, db = NULL, return_best = FALSE)enhance_struc(strucs, db = NULL, return_best = FALSE)
strucs |
A |
db |
A |
return_best |
Logical. If |
The target resolution level is determined from db. When db is NULL,
the default glydb::glydb_structures() at "intact" level is used.
If return_best=TRUE:
An unnamed glyrepr::glycan_structure() vector with the same length as strucs.
Unmatched structures are returned as NA.
If return_best=FALSE:
A tibble with the following columns:
raw: The original glycan structures.
enhanced: The enhanced glycan structures.
Note that one raw glycan structure can have different enhanced glycan structures
as multiple rows in the result.
# From topological level to intact level db_intact <- c("Gal(b1-3)GalNAc(a1-", "Gal(b1-4)GalNAc(a1-") enhance_struc("Gal(??-?)GalNAc(??-", db = db_intact) # From basic level to topological level db_topo <- "Gal(??-?)GalNAc(??-" enhance_struc("Hex(??-?)HexNAc(??-", db = db_topo) # From partial level to intact level enhance_struc("Gal(b1-?)GalNAc(a1-", db = db_intact)# From topological level to intact level db_intact <- c("Gal(b1-3)GalNAc(a1-", "Gal(b1-4)GalNAc(a1-") enhance_struc("Gal(??-?)GalNAc(??-", db = db_intact) # From basic level to topological level db_topo <- "Gal(??-?)GalNAc(??-" enhance_struc("Hex(??-?)HexNAc(??-", db = db_topo) # From partial level to intact level enhance_struc("Gal(b1-?)GalNAc(a1-", db = db_intact)
Add anomer positions to glycan structures with missing anomer information based on biological knowledge. For example, "Gal(??-?)GalNAc(??-" will be converted to "Gal(?1-?)GalNAc(?1-". For anomer positions that are already specified in the input structures, this function will not modify them.
fill_anomer_pos(strucs)fill_anomer_pos(strucs)
strucs |
A |
This function is intended to be used by struc_to_glytoucan()
to ensure that glycan structures have complete anomer information
before attempting to inquire the GlyTouCan database.
For these monosaccharides, the anomer position is "2": "Neu5Ac", "Neu5Gc", "Neu", "Kdn", "Pse", "Leg", "Aci", "4eLeg", "Kdo", "Dha", "Fru", "Tag", "Sor", and "Psi". All other monosaccharides are assumed to have anomer positions on "1".
A glyrepr::glycan_structure() vector with anomer positions added where missing.
library(glyrepr) glycans <- as_glycan_structure(c( "Gal(??-?)GalNAc(??-", "Neu5Ac(??-?)Gal(??-?)GalNAc(??-" )) fill_anomer_pos(glycans)library(glyrepr) glycans <- as_glycan_structure(c( "Gal(??-?)GalNAc(??-", "Neu5Ac(??-?)Gal(??-?)GalNAc(??-" )) fill_anomer_pos(glycans)
A named numeric vector of the mass of each monosaccharide residue.
Used by default as the mass_dict argument in calculate_mz().
The names includes:
Monosaccharides: Hex, HexNAc, dHex, dHexNAc, ddHex, Pen, HexA, HexN, NeuAc, NeuGc, Kdn, Neu
Ions: H+, H, H2O, K+, Na+, NH4+, H-, Cl-, HCO3-
Substituents: S, P
Reducing end: red_end, which is the additional mass of the reducing end of the glycan.
glyanno_mass_dict(deriv = "none", mass_type = "mono")glyanno_mass_dict(deriv = "none", mass_type = "mono")
deriv |
A character scalar of the derivatization method to use. Can be "none", "permethyl", or "peracetyl". Default is "none". |
mass_type |
"mono" for monoisotopic mass, "average" for average mass. Default is "mono". |
A named numeric vector.
glyanno_mass_dict(deriv = "none", mass_type = "mono") glyanno_mass_dict(deriv = "permethyl", mass_type = "mono") glyanno_mass_dict(deriv = "none", mass_type = "average")glyanno_mass_dict(deriv = "none", mass_type = "mono") glyanno_mass_dict(deriv = "permethyl", mass_type = "mono") glyanno_mass_dict(deriv = "none", mass_type = "average")
Given m/z values, this function matches them to all possible glycan compositions in the glydb database.
mz_to_comp( mz, tol = ppm(10), db = NULL, return_best = FALSE, charge = 1, adduct = "H+", mass_dict = NULL )mz_to_comp( mz, tol = ppm(10), db = NULL, return_best = FALSE, charge = 1, adduct = "H+", mass_dict = NULL )
mz |
A numeric vector of m/z values. |
tol |
A numeric scalar of the tolerance for the m/z value in Da or a |
db |
Glycan compositions to match against.
Can be a |
return_best |
A logical scalar. If |
charge |
Charge to use. Can be 0, 1, 2, 3, -1, -2, -3, etc. 0 means neutral. Default is 1. |
adduct |
Adduct to use. Can be "H+", "K+", "Na+", "NH4+", "H-", "Cl-", "HCO3-". Default is "H+".
|
mass_dict |
A named numeric vector of the mass of each monosaccharide residue.
Default is |
If return_best=TRUE:
An unnamed glyrepr::glycan_composition() vector with the same length as mz.
Unmatched m/z values are returned as NA.
If return_best=FALSE:
A tibble with the following columns:
mz: The molecule m/z values, same as the input mz.
composition: The possible glycan compositions, as glyrepr::glycan_composition() vector.
Note that one m/z value can have multiple rows in the result,
corresponding to different possible glycan compositions.
db
The db parameter is very important for all functions in this package.
By default, it uses all available glycans in the glydb package,
which is usually larger than what you need.
You can use helper functions in glydb to narrow down the database,
e.g. glydb::glydb_compositions() or glydb::glydb_structures().
You can use the species and glycan_type parameters to focus on specific species and glycan type.
For example, if you are only interested in N-glycan compositions in human,
you can use glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N").
Also, you can decide the level of information in the database by setting mono_type
of glydb::glydb_compositions() and structure_level of glydb::glydb_structures().
You can then pass the result to the db parameter of this function.
For example,
my_db <- glydb::glydb_compositions(species = "Homo sapiens", glycan_type = "N") mz_to_comp(mz, db = my_db)
mz_to_comp(933.3175, charge = 1, adduct = "Na+")mz_to_comp(933.3175, charge = 1, adduct = "Na+")
This helper function should be used in mz_to_comp() as the tol argument.
It makes the function using PPM as dynamic tolerance.
ppm(x)ppm(x)
x |
A numeric scalar of the PPM value. |
A function that takes a numeric vector of m/z values and returns a numeric vector of tolerances.
ppm(10)(2368.84) ppm(10)(c(2368.84, 2368.85))ppm(10)(2368.84) ppm(10)(c(2368.84, 2368.85))
This function takes a vector of glycan structures and returns the corresponding GlyTouCan accessions. Under the hood, it uses the GlycanFormatConverter API maintained by the Glycosmos project.
struc_to_glytoucan(strucs)struc_to_glytoucan(strucs)
strucs |
A |
For "topological" structures (e.g., "Gal(??-?)GalNAc(??-"),
this function will first call fill_anomer_pos() to fill in the missing anomeric positions before querying the API.
This is necessary because all glycan structures in GlyTouCan must have defined anomeric positions.
A character vector of GlyTouCan accessions corresponding to the input glycan structures.
If a structure cannot be converted to a GlyTouCan accession, the corresponding entry will be NA.