--- title: "Working with glyexp" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with glyexp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Working with Motifs in Experiments `glymotif` is most useful when combined with other tools in the `glycoverse` ecosystem. If you're already using `glyread` to import your glycoproteomics results and `glyexp` to manage your experimental data, you can add motif-level information directly to your experiment. `glymotif` provides functions for adding motif information to `experiment` objects. **Important note:** This vignette assumes you're familiar with the `glyexp` package. If you haven't used it yet, start with its [introduction](https://glycoverse.github.io/glyexp/articles/glyexp.html). ```{r setup} library(glymotif) library(glyexp) library(dplyr) ``` ## Real Glycoproteomics Data Let's work with a real-world dataset that shows how `glymotif` can be used with experimental data. We will use `real_experiment` from the `glyexp` package, a serum N-glycoproteomics study with 12 samples. First, let's use `glyclean::auto_clean()` to preprocess the data. ```{r} library(glyclean) exp <- auto_clean(real_experiment) exp ``` Now, let's inspect the variable and sample information. ```{r} get_var_info(exp) ``` ```{r} get_sample_info(exp) ``` This is an N-glycoproteomics dataset with 500 PSMs (Peptide Spectrum Matches) across 12 samples, which is a useful setting for motif analysis. **Tip:** In real-world data analysis, you will usually want to use `glyclean` to perform data preprocessing before diving into any analysis. ## Adding Motif Annotations to an Experiment The variable information tibble contains details about each glycoform - the proteins, sites, and glycan structures. We can add columns that describe whether specific motifs are present in each glycan. **The simple approach:** One motif at a time ```{r} exp |> mutate_var(n_hex = have_motif(glycan_structure, "Hex(a1-")) |> get_var_info() |> select(variable, protein, glycan_structure, n_hex) ``` **A repetitive approach:** Multiple motifs one call at a time ```{r, eval=FALSE} # Don't do this exp |> mutate_var( n_hex = have_motif(glycan_structure, "Hex(a1-"), n_hexna = have_motif(glycan_structure, "HexNAc(a1-"), n_dhex = have_motif(glycan_structure, "dHex(a1-") ) ``` While this approach works, it's not very efficient. Those three separate calls to `have_motif()` all perform time-consuming validations and conversions on the same set of glycan structures. Plus, there's a lot of repetitive typing. You have to type `have_motif` and `glycan_structure` three times. **The preferred approach:** Use the `add_motifs_lgl()` or `add_motifs_int()` functions instead. They might look like simple syntactic sugar, but they are optimized for this scenario: ```{r} exp2 <- exp |> add_motifs_lgl(c(motif1 = "Hex(??-", motif2 = "HexNAc(??-", motif3 = "dHex(??-")) ``` The motif annotations are now integrated into your variable information tibble. ```{r} exp2 |> get_var_info() |> select(variable, motif1, motif2, motif3) ``` You can use these new columns for downstream filtering and analysis. For example: ```{r, eval=FALSE} # You can perform pathway enrichment on all glycoproteins containing some motif: exp2 |> filter_var(motif1 == TRUE) |> gly_enrich_reactome() # from the `glystats` package ``` ## Motif Quantification in Experiments Want to quantify the motifs in your experiment? Try the `glydet` package. It provides the `quantify_motifs()` function to perform relative and absolute motif quantification.