--- title: "Get Started with glyanno" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get Started with glyanno} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## The Challenge Mass spectrometry-based glycomics and glycoproteomics often yield data with limited structural resolution. For example, MALDI-TOF MS-based glycomics typically provides only m/z values, while many LC-MS/MS-based glycoproteomics workflows identify basic glycan compositions but lack isomer specificity. Mass spectrometry alone frequently fails to distinguish specific monosaccharide isomers (e.g., differentiating Mannose from Galactose), or resolve linkage details (e.g., a1-3 vs b1-4). In theory, full structural resolution is achievable by combining MS with orthogonal techniques such as enzymatic digestion or NMR; however, these approaches are often resource-intensive and time-consuming. Unfortunately, advanced downstream analyses—such as the glycan biosynthetic pathway reconstruction offered by [glyenzy](https://github.com/glycoverse/glyenzy)— require fully resolved glycan structures, including specific monosaccharide identities and linkage information. ## What is glyanno? `glyanno` bridges this gap. This package is designed to maximize the utility of your mass spectrometry data by annotating it with probable biological context derived from knowledge bases. The package features four core functions: - `mz_to_comp()`: Identifies possible glycan compositions from m/z values. - `comp_to_struc()`: Maps glycan compositions to potential glycan structures. - `enhance_comp()`: Refines generic glycan compositions into concrete, monosaccharide-specific compositions. - `enhance_struc()`: Resolves low-resolution glycan structures into high-resolution structures with linkage information. ![](diagram.png){width=500} ```{r setup} library(glyanno) library(glydb) ``` **Note:** This package relies on [glyrepr](https://github.com/glycoverse/glyrepr) and [glydb](https://github.com/glycoverse/glydb). We recommend familiarizing yourself with these packages, especially `glyrepr`, before using `glyanno`. Additionally, a basic understanding of IUPAC-condensed notation is recommended for this vignette. You can refer to this [tutorial](https://glycoverse.github.io/glyrepr/articles/iupac.html). ## Workflow: M/z -> Composition -> Structure Let's demonstrate these functions step-by-step, starting with a single m/z value. ```{r} mz_to_comp(406.1325, charge = 1, adduct = "Na+") ``` This returns every composition in `glydb` matching the m/z of 406.1325. In practice, returning "all" possibilities can be overwhelming; you will often want to constrain the search space based on biological context. `glydb` provides helper functions to filter the database. For example, you might want to limit the search to only human O-GalNAc glycans. ```{r} my_db <- glydb_compositions(species = "Homo sapiens", glycan_type = "O-GalNAc") my_db ``` `my_db` is simply a `glyrepr::glycan_composition()` vector. You can pass it to the `db` argument of `mz_to_comp()` to filter results. ```{r} mz_to_comp(406.1325, charge = 1, adduct = "Na+", db = my_db) ``` The results are now significantly more relevant to our specific context. With the composition identified, we can proceed to determine potential structures. The m/z value 406.1325 corresponds to the composition Gal(1)GalNAc(1). What are the possible structures for this composition? ```{r} struc_db <- glydb_structures(species = "Homo sapiens", glycan_type = "O-GalNAc") comp_to_struc("Gal(1)GalNAc(1)", db = struc_db) ``` Note that here we use `glydb_structures()` instead of `glydb_compositions()` to create a structure-specific database. Two structures are possible, one is Core 1, the other is Core 5. Sometimes we just want a "most possible" result. In this case, you can set `return_best` to `TRUE`: ```{r} comp_to_struc("Gal(1)GalNAc(1)", db = struc_db, return_best = TRUE) ``` Core 1 is more common than Core 5, so it was kept. The function keeps the structure with most citations for multiple matches. Note that when `return_best = TRUE`, the function returns a vector with the same length of the input instead of a tibble. When a glycan has no matches, `` will be on the corresponding position. Note that all functions in `glyanno` works vectorizedly: ```{r} comp_to_struc(c("Gal(1)GalNAc(1)", "GlcNAc(1)GalNAc(1)"), db = struc_db, return_best = TRUE) ``` If you set `return_best` to `TRUE`, the function directly returns a vector: ```{r} comp_to_struc(c("Gal(1)GalNAc(1)", "GlcNAc(1)GalNAc(1)"), db = struc_db, return_best = TRUE) ``` This vector always has the same length as the input, with `NA` for glycans with no match. One last thing to mention before we move on is that you can set custom structure levels for `db`. For example, it is possible to use a database with only topology-level structures without linkage information. ```{r} struc_db <- glydb_structures( structure_level = "topological", species = "Homo sapiens", glycan_type = "O-GalNAc" ) comp_to_struc(c("Gal(1)GalNAc(1)", "GlcNAc(1)GalNAc(1)"), db = struc_db) ``` ## Enhancing Compositions and Structures Researchers often need to refine generic annotations into more specific ones. For instance, you may wish to convert generic compositions into specific monosaccharide lists, or assign potential linkages to a topology-only structure. `enhance_comp()` and `enhance_struc()` are designed for this purpose. Both functions return a tibble with two columns: `raw` (the original input) and `enhanced` (the potential high-resolution candidates). ```{r} enhance_comp("Hex(1)HexNAc(1)") ``` ```{r} enhance_struc("Gal(??-?)GalNAc(??-") ``` Similarly, providing a custom database will narrow down the results to biologically relevant candidates. ```{r} enhance_comp("Hex(1)HexNAc(1)", db = glydb_compositions(species = "Homo sapiens", glycan_type = "O-GalNAc")) ``` ```{r} enhance_struc("Gal(??-?)GalNAc(??-", db = glydb_structures(species = "Homo sapiens", glycan_type = "O-GalNAc")) ``` You can set `return_best` to `TRUE` as well. ```{r} enhance_struc( "Gal(??-?)GalNAc(??-", db = glydb_structures(species = "Homo sapiens", glycan_type = "O-GalNAc"), return_best = TRUE ) ``` ## Bidirectional Conversion While `glyanno` focuses on inferring high-resolution information from low-resolution data, the `glycoverse` ecosystem also provides complementary functions to perform the reverse operations—simplifying detailed structures back to their lower-resolution representations. A summary of these relationships: | Enhance | Back | |---------|------| | `mz_to_comp()` | `calculate_mz()` | | `comp_to_struc()` | `glyrepr::as_glycan_composition()` | | `enhance_comp()` | `glyrepr::convert_to_generic()` | | `enhance_struc()` | `glyrepr::reduce_structure_level()` |