Package 'glyexp' reference manual

Title:	Glycoproteomics and Glycomics Experiments
Description:	Provides a tidy data framework for managing glycoproteomics and glycomics experimental data. The 'GlycomicSE' and 'GlycoproteomicSE' classes extend 'SummarizedExperiment' with validated glycomics and glycoproteomics schemas. The package provides dplyr-style data manipulation functions (filter, mutate, select, arrange, slice, join) for seamless data wrangling. As the data core of the 'glycoverse' ecosystem, it provides consistent interfaces for data exchange and analysis workflows.
Authors:	Bin Fu [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-8567-2997>)
Maintainer:	Bin Fu <[email protected]>
License:	MIT + file LICENSE
Version:	0.16.0
Built:	2026-07-21 05:15:13 UTC
Source:	https://github.com/glycoverse/glyexp

Arrange sample or variable information

Description

Arrange the sample or variable information of an experiment() or SummarizedExperiment.

The same syntax as dplyr::arrange() is used. For example, to arrange samples by the "group" column, use arrange_col(exp, group). This actually calls dplyr::arrange() on the sample information tibble with the group column, and then updates the expression matrix accordingly to match the new order.

Usage

arrange_col(exp, ...)

arrange_row(exp, ...)
arrange_col(exp, ...)

arrange_row(exp, ...)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> Variables to arrange by, passed to dplyr::arrange() internally.

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

For a SummarizedExperiment, sample and variable identifiers live in colnames(exp) and rownames(exp), rather than in SummarizedExperiment::colData() or SummarizedExperiment::rowData(). Observation verbs expose colnames(exp) as a virtual .sample column, and variable verbs expose rownames(exp) as a virtual .variable column. These dot-prefixed names distinguish dimension identifiers from regular metadata columns. After the operation, the virtual column is removed and its values are written back to the corresponding dimension names.

Consequently, sample in colData(exp) and variable in rowData(exp) remain ordinary metadata columns. The names .sample and .variable are reserved; an input containing either name in the corresponding metadata raises an error rather than overwriting that column.

Examples

library(SummarizedExperiment)

# Add a variable annotation to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Arrange samples by group column
arranged_exp <- arrange_col(exp, group)
colData(arranged_exp)
assay(arranged_exp)

# Arrange variables by type column
arranged_exp <- arrange_row(exp, type)
rowData(arranged_exp)
assay(arranged_exp)

# Arrange by multiple columns
arrange_col(exp, group, .sample)

library(SummarizedExperiment)

# Add a variable annotation to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Arrange samples by group column
arranged_exp <- arrange_col(exp, group)
colData(arranged_exp)
assay(arranged_exp)

# Arrange variables by type column
arranged_exp <- arrange_row(exp, type)
rowData(arranged_exp)
assay(arranged_exp)

# Arrange by multiple columns
arrange_col(exp, group, .sample)

Coerce to GlycomicSE

Description

as_glycomic_se() converts supported objects to a GlycomicSE object. Existing GlycomicSE objects are returned unchanged. experiment() objects are first converted with as_se(), and SummarizedExperiment objects are reclassified after GlycomicSE validity checks pass.

is_glycomic_se() checks whether an object inherits from GlycomicSE.

Usage

as_glycomic_se(x)

is_glycomic_se(x)
as_glycomic_se(x)

is_glycomic_se(x)

Arguments

x

An object to coerce or check.

Value

as_glycomic_se() returns a GlycomicSE object. is_glycomic_se() returns a logical value.

Coerce to GlycoproteomicSE

Description

as_glycoproteomic_se() converts supported objects to a GlycoproteomicSE object. Existing GlycoproteomicSE objects are returned unchanged. experiment() objects are first converted with as_se(), and SummarizedExperiment objects are reclassified after GlycoproteomicSE validity checks pass.

is_glycoproteomic_se() checks whether an object inherits from GlycoproteomicSE.

Usage

as_glycoproteomic_se(x)

is_glycoproteomic_se(x)
as_glycoproteomic_se(x)

is_glycoproteomic_se(x)

Arguments

x

An object to coerce or check.

Value

as_glycoproteomic_se() returns a GlycoproteomicSE object. is_glycoproteomic_se() returns a logical value.

Convert a glycoproteomics experiment to a pseudo-glycome experiment

Description

Transforms a glycoproteomics-type experiment into a glycomics-type experiment by aggregating expression values by glycan structure (if available) or glycan composition.

This function implements the "pseudo-glycome" method described in doi:10.1038/s41467-026-68579-x, which aggregates glycoproteomic data by glycans to simulate glycome data when real glycome is unavailable.

Usage

as_pseudo_glycome(exp, aggr_method = c("sum", "mean", "median"))
as_pseudo_glycome(exp, aggr_method = c("sum", "mean", "median"))

Arguments

exp

A glycoproteomics experiment() or a GlycoproteomicSE().

aggr_method

Aggregation method to use. One of "sum", "mean", or "median". Default is "sum". Note that glycopeptides can have different ionization efficiencies, so none of these methods are technically rigorous.

Details

Aggregation behavior:

If glycan_structure column exists in var_info, aggregation is done by glycan structure (more specific)
Otherwise, aggregation is done by glycan_composition
Expression values are aggregated within each glycan group using the specified aggr_method

Limitation: Glycopeptides can have different ionization efficiencies, so the aggregation operation is not technically rigorous regardless of the method used. Use results with caution.

Value

If exp is an experiment(), a glycomics-type experiment() with aggregated expression values.

If exp is a GlycoproteomicSE(), a GlycomicSE() with aggregated expression values.

The variable metadata will contain only glycan_composition and glycan_structure (if present in input) columns.

Examples

library(glyrepr)
as_pseudo_glycome(real_experiment)

library(glyrepr)
as_pseudo_glycome(real_experiment)

Filter samples or variables of an experiment

Description

Getting a subset of an experiment() or SummarizedExperiment by filtering samples or variables.

The same syntax as dplyr::filter() is used. For example, to get a subset of an experiment keeping only "HC" samples, use filter_col(exp, group == "HC"). This actually calls dplyr::filter() on the sample information tibble with condition group == "HC", and then updates the expression matrix accordingly.

Usage

filter_col(exp, ..., .drop_levels = FALSE)

filter_row(exp, ..., .drop_levels = FALSE)
filter_col(exp, ..., .drop_levels = FALSE)

filter_row(exp, ..., .drop_levels = FALSE)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> Expression to filter samples or variables. passed to dplyr::filter() internally.

.drop_levels

Logical. If TRUE, drop unused factor levels for columns referenced in the filtering expressions.

Details

One difference between filter_col() or filter_row() and dplyr::filter is that, when filtering on factor columns, unused levels can be dropped by setting .drop_levels to TRUE.

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Examples

library(SummarizedExperiment)

# Add a variable annotation to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Filter samples
sub_exp_1 <- filter_col(exp, group == "H")
colData(sub_exp_1)

# Filter variables
sub_exp_2 <- filter_row(exp, type == "glycopeptide")
rowData(sub_exp_2)

# Use pipe
sub_exp_3 <- exp |>
  filter_col(group == "H") |>
  filter_row(type == "glycopeptide")
sub_exp_3

library(SummarizedExperiment)

# Add a variable annotation to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Filter samples
sub_exp_1 <- filter_col(exp, group == "H")
colData(sub_exp_1)

# Filter variables
sub_exp_2 <- filter_row(exp, type == "glycopeptide")
rowData(sub_exp_2)

# Use pipe
sub_exp_3 <- exp |>
  filter_col(group == "H") |>
  filter_row(type == "glycopeptide")
sub_exp_3

Create a GlycomicSE object

Description

GlycomicSE() creates a single-assay SummarizedExperiment::SummarizedExperiment() subclass for glycomics data. It is a thin wrapper around SummarizedExperiment() with additional Glycoverse validation:

Exactly one assay is allowed. Extra assays are rejected to avoid ambiguous glycomics measurements.
The assay must contain only non-negative values. Raw glycomics abundance data are non-negative; log transformation should be handled by downstream Glycoverse packages.
rowData must contain a glycan_composition column, and that column must be a glyrepr::glycan_composition() vector. A glycan_structure column is optional, but if present it must be a glyrepr::glycan_structure() vector.
metadata must contain a glycan_type field.

This container is experimental and is not recognized by all Glycoverse packages yet. It is intended to become a recommended entry point as package contracts migrate toward SummarizedExperiment-based containers.

Usage

GlycomicSE(abundance, ...)
GlycomicSE(abundance, ...)

Arguments

abundance

A numeric abundance matrix with glycans as rows and samples as columns.

...

Arguments passed to SummarizedExperiment::SummarizedExperiment().

rowData: A S4Vectors::DataFrame() with at least the following columns:
- glycan_composition: required, a glyrepr::glycan_composition() vector
- glycan_structure: optional, a glyrepr::glycan_structure() vector
colData: A S4Vectors::DataFrame().
metadata: A list. It must include a glycan_type field with one of "N", "O", "O-GalNAc", "O-Man", "O-Fuc", "O-GlcNAc", "O-Glc", "HMO", "GSL", "GAG", or "GPI".

Value

A GlycomicSE object.

S4 class

GlycomicSE is an S4 class that extends SummarizedExperiment::SummarizedExperiment().

Create a GlycoproteomicSE object

Description

GlycoproteomicSE() creates a single-assay SummarizedExperiment::SummarizedExperiment() subclass for glycoproteomics data. It is a thin wrapper around SummarizedExperiment() with additional Glycoverse validation:

Exactly one assay is allowed. Extra assays are rejected to avoid ambiguous glycoproteomics measurements.
The assay must contain only non-negative values. Raw glycoproteomics abundance data are non-negative; log transformation should be handled by downstream Glycoverse packages.
rowData must contain protein, protein_site, and glycan_composition columns. The protein column must be character, protein_site must be integer-like, and glycan_composition must be a glyrepr::glycan_composition() vector. A glycan_structure column is optional, but if present it must be a glyrepr::glycan_structure() vector.
metadata must contain a glycan_type field.

Usage

GlycoproteomicSE(abundance, ...)
GlycoproteomicSE(abundance, ...)

Arguments

abundance

A numeric abundance matrix with glycopeptides or glycoforms as rows and samples as columns.

...

Arguments passed to SummarizedExperiment::SummarizedExperiment().

rowData: A S4Vectors::DataFrame() with at least the following columns:
- protein: required, a character vector
- protein_site: required, an integer-like vector
- glycan_composition: required, a glyrepr::glycan_composition() vector
- glycan_structure: optional, a glyrepr::glycan_structure() vector
colData: A S4Vectors::DataFrame().
metadata: A list. It must include a glycan_type field with one of "N", "O", "O-GalNAc", "O-Man", "O-Fuc", "O-GlcNAc", "O-Glc", "HMO", "GSL", "GAG", or "GPI".

Value

A GlycoproteomicSE object.

S4 class

GlycoproteomicSE is an S4 class that extends SummarizedExperiment::SummarizedExperiment().

Join data to sample or variable information

Description

These functions allow you to join additional data to the sample information or variable information of an experiment() or SummarizedExperiment. They work similarly to dplyr::left_join(), dplyr::inner_join(), dplyr::semi_join(), and dplyr::anti_join(), while keeping assay dimensions synchronized with the joined metadata.

After joining, the assay dimensions are automatically updated to reflect any changes in the number of samples or variables.

Important Notes:

The relationship parameter is locked to "many-to-one" to ensure that the number of observations never increases, which would violate the experiment object assumptions.
right_join() and full_join() are not supported as they could add new observations to the experiment.

Usage

left_join_col(exp, y, by = NULL, ...)

inner_join_col(exp, y, by = NULL, ...)

semi_join_col(exp, y, by = NULL, ...)

anti_join_col(exp, y, by = NULL, ...)

left_join_row(exp, y, by = NULL, ...)

inner_join_row(exp, y, by = NULL, ...)

semi_join_row(exp, y, by = NULL, ...)

anti_join_row(exp, y, by = NULL, ...)
left_join_col(exp, y, by = NULL, ...)

inner_join_col(exp, y, by = NULL, ...)

semi_join_col(exp, y, by = NULL, ...)

anti_join_col(exp, y, by = NULL, ...)

left_join_row(exp, y, by = NULL, ...)

inner_join_row(exp, y, by = NULL, ...)

semi_join_row(exp, y, by = NULL, ...)

anti_join_row(exp, y, by = NULL, ...)

Arguments

exp

An experiment() or SummarizedExperiment object.

y

A data frame to join to sample_info or var_info.

by

A join specification created with dplyr::join_by(), or a character vector of variables to join by. See dplyr::left_join() for details.

...

Other arguments passed to the underlying dplyr join function, except relationship which is locked to "many-to-one".

Value

An object of the same class as exp, with updated sample or variable information.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Mutate sample or variable information

Description

Mutate the sample or variable information of an experiment() or SummarizedExperiment.

The same syntax as dplyr::mutate() is used. For example, to add a new column to the sample information tibble, use mutate_col(exp, new_column = value). This actually calls dplyr::mutate() on the sample information tibble with new_column = value.

If an identifier column is modified, its new values must be unique; otherwise, an error is thrown. The assay column names or row names will be updated accordingly.

Usage

mutate_col(exp, ...)

mutate_row(exp, ...)
mutate_col(exp, ...)

mutate_row(exp, ...)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> Name-value pairs, passed to dplyr::mutate() internally.

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Examples

library(SummarizedExperiment)

# Add metadata to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Add a new column to sample information tibble or variable information tibble
exp |>
  mutate_col(new_column = 1) |>
  colData()

exp |>
  mutate_row(new_column = "A") |>
  rowData()

# Modify existing columns
exp |>
  mutate_col(group = dplyr::if_else(group == "H", "healthy", "other")) |>
  colData()

exp |>
  mutate_row(type = dplyr::if_else(type == "glycopeptide", "good", "bad")) |>
  rowData()

# SummarizedExperiment identifiers use virtual dot-prefixed columns
mutate_col(exp, .sample = paste0("new_", .sample))
mutate_row(exp, .variable = paste0("new_", .variable))

library(SummarizedExperiment)

# Add metadata to a bundled experiment
exp <- real_experiment |>
  mutate_row(type = "glycopeptide")

# Add a new column to sample information tibble or variable information tibble
exp |>
  mutate_col(new_column = 1) |>
  colData()

exp |>
  mutate_row(new_column = "A") |>
  rowData()

# Modify existing columns
exp |>
  mutate_col(group = dplyr::if_else(group == "H", "healthy", "other")) |>
  colData()

exp |>
  mutate_row(type = dplyr::if_else(type == "glycopeptide", "good", "bad")) |>
  rowData()

# SummarizedExperiment identifiers use virtual dot-prefixed columns
mutate_col(exp, .sample = paste0("new_", .sample))
mutate_row(exp, .variable = paste0("new_", .variable))

Real glycoproteomics experiment

Description

A real glycoproteomics experiment object. This dataset is derived from the unpublished ProteomeXchange dataset PXD063749. It contains serum N-glycoproteome profiles from 12 patients with different liver conditions: healthy (H), hepatitis (M), cirrhosis (Y), and hepatocellular carcinoma (C). Glycopeptide identification and annotation were performed using pGlyco3 (https://github.com/pFindStudio/pGlyco3), and quantification was carried out with pGlycoQuant (https://github.com/Power-Quant/pGlycoQuant). The raw data were imported with glyread::read_pglyco3_pglycoquant().

Usage

real_experiment
real_experiment

Format

A GlycoproteomicSE() object with 4262 variables and 12 samples.

Variable information:

peptide: peptide sequence
peptide_site: site position on the peptide
protein: protein accession
protein_site: site position on the protein
gene: gene symbol
glycan_composition: glycan composition (glyrepr::glycan_composition())
glycan_structure: glycan structure (glyrepr::glycan_structure())

Sample information:

group: disease group, one of "H" (healthy), "M" (hepatitis), "Y" (cirrhosis), and "C" (hepatocellular carcinoma)

Real glycomics experiment

Description

A real glycomics experiment object. This dataset is derived from the unpublished glycoPOST dataset GPST000313. It contains serum N-glycome profiles from 144 samples with different liver conditions: healthy (H), hepatitis (M), cirrhosis (Y), and hepatocellular carcinoma (C).

Usage

real_experiment2
real_experiment2

Format

An GlycomicSE() object with 67 variables and 144 samples.

Variable information:

glycan_composition: glycan composition (glyrepr::glycan_composition())
glycan_structure: glycan structure (glyrepr::glycan_structure())

Sample information:

group: disease group, one of "H" (healthy), "M" (hepatitis), "Y" (cirrhosis), and "C" (hepatocellular carcinoma)

Rename columns in the sample or variable information tibble

Description

These two functions provide a way to rename columns in the sample or variable information of an experiment() or SummarizedExperiment.

The same syntax as dplyr::rename() is used. For example, to rename the "group" column in the sample information tibble to "condition", use rename_col(exp, condition = group). Note that you can't rename the "sample" column in the sample information tibble, as well as the "variable" column in the variable information tibble. These two columns are used to link the sample or variable information tibble to the expression matrix.

Usage

rename_col(exp, ...)

rename_row(exp, ...)
rename_col(exp, ...)

rename_row(exp, ...)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> Name pairs to rename. Use new_name = old_name to rename columns.

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Examples

toy_exp <- real_experiment
toy_exp

# Rename columns in sample information tibble
rename_col(toy_exp, condition = group)

# Rename columns in variable information tibble
rename_row(toy_exp, composition = glycan_composition)

toy_exp <- real_experiment
toy_exp

# Rename columns in sample information tibble
rename_col(toy_exp, condition = group)

# Rename columns in variable information tibble
rename_row(toy_exp, composition = glycan_composition)

Select columns of the sample or variable information tibble

Description

These two functions provide a way to trimming down the sample or variable information tibble of an experiment() or SummarizedExperiment to only the columns of interest.

The same syntax as dplyr::select() is used. For example, to get a new experiment() with only the "sample" and "group" columns in the sample information tibble, use select_col(exp, group). Note that you don't need to (and you can't) explicitly select or deselect the sample column in sample_info. The same applies to the variable column in var_info. Whatever the selection expression is, the sample or variable column will always be kept.

Usage

select_col(exp, ...)

select_row(exp, ...)
select_col(exp, ...)

select_row(exp, ...)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> Column names to select. If empty, all columns except the sample or variable column will be discarded.

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Examples

toy_exp <- real_experiment

toy_exp_2 <- toy_exp |>
  select_col(group) |>
  select_row(protein, peptide)

toy_exp_2

toy_exp <- real_experiment

toy_exp_2 <- toy_exp |>
  select_col(group) |>
  select_row(protein, peptide)

toy_exp_2

Slice sample or variable information

Description

Slice the sample or variable information of an experiment() or SummarizedExperiment.

These functions provide row-wise slicing operations similar to dplyr's slice functions. They select rows by position or based on values in specified columns, and update the expression matrix accordingly to match the new selection.

slice_col() and slice_row(): Select rows by position
slice_head_col() and slice_head_row(): Select first n rows
slice_tail_col() and slice_tail_row(): Select last n rows
slice_sample_col() and slice_sample_row(): Select random n rows
slice_max_col() and slice_max_row(): Select rows with highest values
slice_min_col() and slice_min_row(): Select rows with lowest values

Usage

slice_col(exp, ...)

slice_row(exp, ...)

slice_head_col(exp, n, prop)

slice_head_row(exp, n, prop)

slice_tail_col(exp, n, prop)

slice_tail_row(exp, n, prop)

slice_sample_col(exp, n, prop, weight_by = NULL, replace = FALSE)

slice_sample_row(exp, n, prop, weight_by = NULL, replace = FALSE)

slice_max_col(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_max_row(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_min_col(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_min_row(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)
slice_col(exp, ...)

slice_row(exp, ...)

slice_head_col(exp, n, prop)

slice_head_row(exp, n, prop)

slice_tail_col(exp, n, prop)

slice_tail_row(exp, n, prop)

slice_sample_col(exp, n, prop, weight_by = NULL, replace = FALSE)

slice_sample_row(exp, n, prop, weight_by = NULL, replace = FALSE)

slice_max_col(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_max_row(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_min_col(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

slice_min_row(exp, order_by, ..., n, prop, with_ties = TRUE, na_rm = FALSE)

Arguments

exp

An experiment() or SummarizedExperiment object.

...

<data-masking> For ⁠slice_*()⁠, integer row positions. For slice_max() and slice_min(), variables to order by. Other arguments passed to the corresponding dplyr function.

n

For slice_head(), slice_tail(), slice_sample(), slice_max(), and slice_min(), the number of rows to select.

prop

For slice_head(), slice_tail(), slice_sample(), slice_max(), and slice_min(), the proportion of rows to select.

weight_by

For slice_sample(), sampling weights.

replace

For slice_sample(), should sampling be with replacement?

order_by

For slice_max() and slice_min(), variable to order by.

with_ties

For slice_max() and slice_min(), should ties be kept?

na_rm

For slice_max() and slice_min(), should missing values be removed?

Value

An object of the same class as exp.

Identifier columns

For an experiment() object, sample is a physical column in sample_info, and variable is a physical column in var_info.

Examples

# Add values used for slicing to a bundled experiment
exp <- real_experiment |>
  mutate_col(score = seq_len(dplyr::n())) |>
  mutate_row(value = seq_len(dplyr::n()))

# Select specific rows by position
slice_col(exp, 1, 3, 5)

# Select first 3 samples
slice_head_col(exp, n = 3)

# Select last 2 variables
slice_tail_row(exp, n = 2)

# Select 2 random samples
slice_sample_col(exp, n = 2)

# Select samples with highest scores
slice_max_col(exp, order_by = score, n = 2)

# Select variables with lowest values
slice_min_row(exp, order_by = value, n = 2)

# Add values used for slicing to a bundled experiment
exp <- real_experiment |>
  mutate_col(score = seq_len(dplyr::n())) |>
  mutate_row(value = seq_len(dplyr::n()))

# Select specific rows by position
slice_col(exp, 1, 3, 5)

# Select first 3 samples
slice_head_col(exp, n = 3)

# Select last 2 variables
slice_tail_row(exp, n = 2)

# Select 2 random samples
slice_sample_col(exp, n = 2)

# Select samples with highest scores
slice_max_col(exp, order_by = score, n = 2)

# Select variables with lowest values
slice_min_row(exp, order_by = value, n = 2)

Standardize variable IDs in an experiment

Description

Converts meaningless variable IDs (like "GP1", "V1") into meaningful, human-readable IDs based on the experiment type and available columns.

The format of the new IDs depends on the exp_type if format is not specified:

glycomics: {glycan_composition}, e.g., "Hex(5)HexNAc(2)"
glycoproteomics: {protein}-{protein_site}-{glycan_composition}
traitomics: {motif} or {trait} depending on which column is present
traitproteomics: {protein}-{protein_site}-{motif} or {protein}-{protein_site}-{trait}

If duplicate IDs are generated (e.g., same composition with multiple PSMs), a unique integer suffix is appended using the unique_suffix pattern.

Usage

standardize_variable(exp, format = NULL, unique_suffix = "-{N}")
standardize_variable(exp, format = NULL, unique_suffix = "-{N}")

Arguments

exp

An experiment(), GlycomicSE(), or GlycoproteomicSE().

format

A format string specifying how to construct variable IDs. Use {column_name} to insert values from var_info columns. For example, "{gene}-{glycan_composition}" would produce "GENE1-Hex(5)". If NULL (default), a sensible format is chosen based on exp_type.

unique_suffix

A string pattern for making IDs unique when duplicates exist. Must contain {N} which will be replaced with the numeric suffix (1, 2, 3...). Default is "-{N}" which produces IDs like "Hex(5)-1", "Hex(5)-2".

Value

exp with standardized variable IDs. The input container class is preserved.

Examples

# Glycomics example
expr_mat <- matrix(1:4, nrow = 2)
rownames(expr_mat) <- c("V1", "V2")
colnames(expr_mat) <- c("S1", "S2")
sample_info <- tibble::tibble(sample = c("S1", "S2"))
var_info <- tibble::tibble(
  variable = c("V1", "V2"),
  glycan_composition = glyrepr::glycan_composition(c(Hex = 5, HexNAc = 2))
)
exp <- experiment(expr_mat, sample_info, var_info,
  exp_type = "glycomics", glycan_type = "N"
)
standardize_variable(exp)

# Glycoproteomics example
expr_mat <- matrix(1:4, nrow = 2)
rownames(expr_mat) <- c("GP1", "GP2")
colnames(expr_mat) <- c("S1", "S2")
sample_info <- tibble::tibble(sample = c("S1", "S2"))
var_info <- tibble::tibble(
  variable = c("GP1", "GP2"),
  protein = c("P12345", "P12345"),
  protein_site = c(32L, 45L),
  glycan_composition = glyrepr::glycan_composition(c(Hex = 5, HexNAc = 2))
)
exp <- experiment(expr_mat, sample_info, var_info,
  exp_type = "glycoproteomics", glycan_type = "N"
)
standardize_variable(exp)

# Custom format example
standardize_variable(exp, format = "{protein}-{glycan_composition}")

# Glycomics example
expr_mat <- matrix(1:4, nrow = 2)
rownames(expr_mat) <- c("V1", "V2")
colnames(expr_mat) <- c("S1", "S2")
sample_info <- tibble::tibble(sample = c("S1", "S2"))
var_info <- tibble::tibble(
  variable = c("V1", "V2"),
  glycan_composition = glyrepr::glycan_composition(c(Hex = 5, HexNAc = 2))
)
exp <- experiment(expr_mat, sample_info, var_info,
  exp_type = "glycomics", glycan_type = "N"
)
standardize_variable(exp)

# Glycoproteomics example
expr_mat <- matrix(1:4, nrow = 2)
rownames(expr_mat) <- c("GP1", "GP2")
colnames(expr_mat) <- c("S1", "S2")
sample_info <- tibble::tibble(sample = c("S1", "S2"))
var_info <- tibble::tibble(
  variable = c("GP1", "GP2"),
  protein = c("P12345", "P12345"),
  protein_site = c(32L, 45L),
  glycan_composition = glyrepr::glycan_composition(c(Hex = 5, HexNAc = 2))
)
exp <- experiment(expr_mat, sample_info, var_info,
  exp_type = "glycoproteomics", glycan_type = "N"
)
standardize_variable(exp)

# Custom format example
standardize_variable(exp, format = "{protein}-{glycan_composition}")

Identification overview

Description

This function summarizes the number of glycan compositions, glycan structures, glycopeptides, peptides, glycoforms, glycoproteins, and glycosites in an experiment(), GlycomicSE(), or GlycoproteomicSE().

Usage

summarize_experiment(x, count_struct = NULL)
summarize_experiment(x, count_struct = NULL)

Arguments

x

An experiment(), GlycomicSE(), or GlycoproteomicSE() object.

count_struct

For counting glycopeptides and glycoforms. whether to count the number of glycan structures or glycopeptides. If TRUE, glycopeptides or glycoforms bearing different glycan structures with the same glycan composition are counted as different ones. If not provided (NULL), defaults to TRUE if glycan_structure column exists in the variable information tibble, otherwise FALSE.

Details

The following columns are required in the variable information tibble:

composition: glycan_composition
structure: glycan_structure
peptide: peptide
glycopeptide: glycan_composition or glycan_structure, peptide, peptide_site
glycoform: glycan_composition or glycan_structure, protein or proteins, protein_site or protein_sites
protein: protein or proteins
glycosite: protein or proteins, protein_site or protein_sites

You can use count_struct parameter to control how to count glycopeptides and glycoforms, either by glycan structures or by glycan compositions.

Value

A tibble with columns item and n summarizing the results. The items include:

total_composition: The number of glycan compositions.
total_structure: The number of glycan structures.
total_peptide: The number of peptides.
total_glycopeptide: The number of unique combinations of peptides, sites, and glycans.
total_glycoform: The number of unique combinations of proteins, sites, and glycans.
total_protein: The number of proteins.
total_glycosite: The number of unique combinations of proteins and sites.

In addition, ⁠_per_sample⁠ items are calculated as the average number of detected items per sample. For example, composition_per_sample is the average number of glycan compositions detected per sample.

Examples

exp <- real_experiment
summarize_experiment(exp)

exp <- real_experiment
summarize_experiment(exp)

Package 'glyexp'

Help Index

Arrange sample or variable information

Description

Usage

Arguments

Value

Identifier columns

Examples

Coerce to GlycomicSE

Description

Usage

Arguments

Value

Coerce to GlycoproteomicSE

Description

Usage

Arguments

Value

Convert a glycoproteomics experiment to a pseudo-glycome experiment

Description

Usage

Arguments

Details

Value

Examples

Filter samples or variables of an experiment

Description

Usage

Arguments

Details

Value

Identifier columns

Examples

Create a GlycomicSE object

Description

Usage

Arguments

Value

S4 class

See Also

Create a GlycoproteomicSE object

Description

Usage

Arguments

Value

S4 class

See Also

Join data to sample or variable information

Description

Usage

Arguments

Value

Identifier columns

See Also

Mutate sample or variable information

Description

Usage

Arguments

Value

Identifier columns

Examples

Real glycoproteomics experiment

Description

Usage

Format

Variable information:

Sample information:

Real glycomics experiment

Description

Usage

Format

Variable information:

Sample information:

Rename columns in the sample or variable information tibble

Description

Usage

Arguments

Value

Identifier columns