---
title: "Working with glyexp"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Working with glyexp}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Working with Motifs in Experiments

`glymotif` is most useful when combined with other tools in the `glycoverse` ecosystem.
If you're already using `glyread` to import your glycoproteomics results
and `glyexp` to manage your experimental data,
you can add motif-level information directly to your experiment.
`glymotif` provides functions for adding motif information to `experiment` objects.

**Important note:** This vignette assumes you're familiar with the `glyexp` package.
If you haven't used it yet, start with its
[introduction](https://glycoverse.github.io/glyexp/articles/glyexp.html).

```{r setup}
library(glymotif)
library(glyexp)
library(dplyr)
```

## Real Glycoproteomics Data

Let's work with a real-world dataset that shows how `glymotif` can be used with experimental data.
We will use `real_experiment` from the `glyexp` package, a serum N-glycoproteomics study with 12 samples.
First, let's use `glyclean::auto_clean()` to preprocess the data.

```{r}
library(glyclean)

exp <- auto_clean(real_experiment)
exp
```

Now, let's inspect the variable and sample information.

```{r}
get_var_info(exp)
```

```{r}
get_sample_info(exp)
```

This is an N-glycoproteomics dataset with 500 PSMs (Peptide Spectrum Matches)
across 12 samples, which is a useful setting for motif analysis.

**Tip:** In real-world data analysis, you will usually want to use `glyclean` to perform
data preprocessing before diving into any analysis.

## Adding Motif Annotations to an Experiment

The variable information tibble contains details about each glycoform -
the proteins, sites, and glycan structures.
We can add columns that describe whether specific motifs are present in each glycan.

**The simple approach:** One motif at a time

```{r}
exp |> 
  mutate_var(n_hex = have_motif(glycan_structure, "Hex(a1-")) |>
  get_var_info() |>
  select(variable, protein, glycan_structure, n_hex)
```

**A repetitive approach:** Multiple motifs one call at a time

```{r, eval=FALSE}
# Don't do this
exp |>
  mutate_var(
    n_hex = have_motif(glycan_structure, "Hex(a1-"),
    n_hexna = have_motif(glycan_structure, "HexNAc(a1-"),
    n_dhex = have_motif(glycan_structure, "dHex(a1-")
  )
```

While this approach works, it's not very efficient.
Those three separate calls to `have_motif()` all perform time-consuming validations and conversions
on the same set of glycan structures.
Plus, there's a lot of repetitive typing.
You have to type `have_motif` and `glycan_structure` three times.

**The preferred approach:** Use the `add_motifs_lgl()` or `add_motifs_int()` functions instead.
They might look like simple syntactic sugar, 
but they are optimized for this scenario:

```{r}
exp2 <- exp |> 
  add_motifs_lgl(c(motif1 = "Hex(??-", motif2 = "HexNAc(??-", motif3 = "dHex(??-"))
```

The motif annotations are now integrated into your variable information tibble.

```{r}
exp2 |>
  get_var_info() |>
  select(variable, motif1, motif2, motif3)
```

You can use these new columns for downstream filtering and analysis.
For example:

```{r, eval=FALSE}
# You can perform pathway enrichment on all glycoproteins containing some motif:
exp2 |>
  filter_var(motif1 == TRUE) |>
  gly_enrich_reactome()  # from the `glystats` package
```

## Motif Quantification in Experiments

Want to quantify the motifs in your experiment?
Try the `glydet` package.
It provides the `quantify_motifs()` function to perform relative and absolute motif quantification.