| Title: | Read and process glycomics and glycoproteomics data |
|---|---|
| Description: | Provides a unified interface for reading and processing quantification results from various glycomics and glycoproteomics software tools, including Byonic (ByoLogic, pGlycoQuant), StrucGP, pGlyco3 (pGlycoQuant), Glyco-Decipher, GlyHunter, and MSFragger. The package standardizes raw data from different sources, performs peptide-spectrum match (PSM) aggregation, parses glycan compositions and structures, and converts the data into a standardized experiment object from the 'glyexp' package for downstream analysis within the glycoverse ecosystem. |
| Authors: | Bin Fu [aut, cre] (ORCID: <https://orcid.org/0000-0001-8567-2997>) |
| Maintainer: | Bin Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.11.0 |
| Built: | 2026-05-14 15:37:27 UTC |
| Source: | https://github.com/glycoverse/glyread |
If you used Byonic for intact glycopeptide identification,
and used Byologic for quantification, this is the function for you.
It reads in a result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
read_byonic_byologic( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", multisite = "expand" )read_byonic_byologic( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", multisite = "expand" )
fp |
File path of the pGlycoQuant result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
orgdb |
name of the OrgDb package to use for UniProt to gene symbol conversion. Default is "org.Hs.eg.db". |
multisite |
How to handle multisite glycopeptides.
|
An glyexp::experiment() object.
Open the .blgc file in the result folder with PMI-Byos. In the "Peptide List" panel (usually on the bottom right), click "Export content of the table to a CSV file" button. The exported .csv file is the file you should use.
Some glycopeptides can have more than one glycosylation site.
By default, they are expanded into multiple rows with the same quantification value
but different protein_site and glycan_composition.
Set multisite = "drop" to remove multisite glycopeptides instead.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession
protein_site: integer, site of glycosylation on protein
gene: character, gene name (symbol)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
pGlyco3 performs quantification on the PSM level. This level of information is too detailed for most downstream analyses. This function aggregate PSMs into glycopeptides through summation. For each glycopeptide (unique combination of "peptide", "peptide_site", "protein", "protein_site", "gene", "glycan_composition", "glycan_structure"), we sum up the quantifications of all PSMs that belong to this glycopeptide.
glyexp::experiment(), glyrepr::glycan_composition()
If you used Byonic for intact glycopeptide identification,
and used pGlycoQuant for quantification, this is the function for you.
It reads in a pGlycoQuant result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
read_byonic_pglycoquant( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", parse_structure = TRUE, multisite = "expand" )read_byonic_pglycoquant( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", parse_structure = TRUE, multisite = "expand" )
fp |
File path of the pGlycoQuant result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
orgdb |
name of the OrgDb package to use for UniProt to gene symbol conversion. Default is "org.Hs.eg.db". |
parse_structure |
Logical. Whether to parse glycan structures.
If |
multisite |
How to handle multisite glycopeptides.
|
An glyexp::experiment() object.
You should use the "Quant.spectra.list" file in the pGlycoQuant result folder. Files from Byonic result folder are not needed. For instructions on how to use Byonic and pGlycoQuant, please refer to the manual: pGlycoQuant.
Some glycopeptides can have more than one glycosylation site.
By default, they are expanded into multiple rows with the same quantification value
but different protein_site and glycan_composition.
Set multisite = "drop" to remove multisite glycopeptides instead.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession
protein_site: integer, site of glycosylation on protein
gene: character, gene name (symbol)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glycan_structure: glyrepr::glycan_structure(), glycan structures (if parse_structure = TRUE).
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
pGlyco3 performs quantification on the PSM level. This level of information is too detailed for most downstream analyses. This function aggregate PSMs into glycopeptides through summation. For each glycopeptide (unique combination of "peptide", "peptide_site", "protein", "protein_site", "gene", "glycan_composition", "glycan_structure"), we sum up the quantifications of all PSMs that belong to this glycopeptide.
glyexp::experiment(), glyrepr::glycan_composition()
GlycanFinder is a software for intact glycopeptide identification.
This function reads in the result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
read_glycan_finder( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", parse_structure = TRUE )read_glycan_finder( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db", parse_structure = TRUE )
fp |
File path of the GlycanFinder result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
orgdb |
Name of the OrgDb package to use for UniProt to gene symbol conversion. Default is "org.Hs.eg.db". |
parse_structure |
Logical. Whether to parse glycan structures.
If |
An glyexp::experiment() object.
You should use the lfq/lfq.protein-glycopeptides.csv file from the GlycanFinder
result folder. This file contains the quantification information for glycopeptides.
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the sample names in the Area columns of the
GlycanFinder result file (e.g., "C_3", "H_3"), although the order can be different.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession
protein_site: integer, site of glycosylation on protein
gene: character, gene name (symbol)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glycan_structure: glyrepr::glycan_structure(), glycan structures (if parse_structure = TRUE).
glyexp::experiment(), glyrepr::glycan_composition(),
glyrepr::glycan_structure()
Glyco-Decipher is a software for glycopeptide identification and quantification.
This function reads in the result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
read_glyco_decipher( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db" )read_glyco_decipher( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, orgdb = "org.Hs.eg.db" )
fp |
File path of the pGlyco3 result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
orgdb |
name of the OrgDb package to use for UniProt to gene symbol conversion. Default is "org.Hs.eg.db". |
An glyexp::experiment() object.
You should use the "site.csv" file in the result folder. This file contains "Site" and "Glycan" columns, followed by quantification result for each sample.
Glyco-Decipher reports uncertain sites and proteins.
This function automatically performs protein inference using the
parsimony method to find the leader proteins.
This ensures each glycopeptide is uniquely mapped to a single protein, gene, and glycosite.
Besides, uncertain sites on proteins are assigned NA.
The following columns could be found in the variable information tibble:
protein: character, protein accession (after protein inference)
protein_site: integer, site of glycosylation on protein (after protein inference)
gene: character, gene name (symbol) (after protein inference)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
This function reads in a GlyHunter
result file and returns a glyexp::experiment() object.
read_glyhunter( fp, sample_info = NULL, preset = c("Fu_NC_2026", "Fu_NP_2026"), glycan_type = "N", sample_name_converter = NULL )read_glyhunter( fp, sample_info = NULL, preset = c("Fu_NC_2026", "Fu_NP_2026"), glycan_type = "N", sample_name_converter = NULL )
fp |
File path of the GlyHunter result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
preset |
"Fu_NC_2026" (default) or "Fu_NP_2026". Use "Fu_NC_2026" to use the configuration following "DOI: 10.1038/s41467-026-68579-x". Use "Fu_NP_2026" to use the configuration in an unpublished Nature Protocols paper. |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
An glyexp::experiment() object.
Use the "summary_area.csv" file in the GlyHunter result folder. No edit is necessary.
The following columns could be found in the variable information tibble:
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
If "Fu_NP_2026" preset is used, two additional columns will be included:
nL: Number of α2,3-linked sialic acids.
nE: Number of α2,6-linked sialic acids.
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
glyexp::experiment(), glyrepr::glycan_composition()
MSFragger-Glyco is a software for glycopeptide identification and quantification.
This function reads in the result file and returns a glyexp::experiment() object.
read_msfragger( dp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL )read_msfragger( dp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL )
dp |
The directory path of the MSFragger-Glyco result folder. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
This function uses the "psm.tsv" file in each sample folder. Sample names are extracted from the file paths. They are the parent of each "psm.tsv" file. For example, "msfragger_result/H1/psm.tsv" will be named "H1".
An glyexp::experiment() object.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession
protein_site: integer, site of glycosylation on protein
gene: character, gene name (symbol)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glyexp::experiment(), glyrepr::glycan_composition()
pGlyco3 is a software for intact glycopeptide identification and quantification.
This function reads in the result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
Use this function if you only use pGlyco3.
If you also use pGlycoQuant for quantification, use read_pglyco3_pglycoquant() instead.
read_pglyco3( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, parse_structure = FALSE )read_pglyco3( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, parse_structure = FALSE )
fp |
File path of the pGlyco3 result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
parse_structure |
Logical. Whether to parse glycan structures.
If |
An glyexp::experiment() object.
You should use the file with "-Pro-Quant" suffix that contains quantification information.
The file should have columns including RawName, MonoArea, Peptide, Proteins,
Genes, GlycanComposition, PlausibleStruct, GlySite, and ProSites.
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
pGlyco3 reports protein groups. That is, shared glycopeptides are reported as a group of proteins separated by ";". This function automatically performs protein inference using the parsimony method to find the leader proteins. This ensures each glycopeptide is uniquely mapped to a single protein, gene, and glycosite.
pGlyco3 performs quantification on the PSM level. This level of information is too detailed for most downstream analyses. This function aggregate PSMs into glycopeptides through summation. For each glycopeptide (unique combination of "peptide", "peptide_site", "protein", "protein_site", "gene", "glycan_composition", "glycan_structure"), we sum up the quantifications of all PSMs that belong to this glycopeptide.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession (after protein inference)
protein_site: integer, site of glycosylation on protein (after protein inference)
gene: character, gene name (symbol) (after protein inference)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glycan_structure: glyrepr::glycan_structure(), glycan structures (if parse_structure = TRUE).
pGlyco3 reports a "plausible structure" for each glycan.
You can set parse_structure = TRUE to parse these structures into a "glycan_structure"
column as a glyrepr::glycan_structure() vector.
However, please take caution with these structures,
because pGlyco3 does not have strict quality control on glycan structure annotations.
glyexp::experiment(), glyrepr::glycan_composition(),
glyrepr::glycan_structure()
If you used pGlyco3 for intact glycopeptide identification,
and used pGlycoQuant for quantification, this is the function for you.
It reads in a pGlycoQuant result file and returns a glyexp::experiment() object.
Currently only label-free quantification is supported.
read_pglyco3_pglycoquant( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, parse_structure = FALSE )read_pglyco3_pglycoquant( fp, sample_info = NULL, quant_method = "label-free", glycan_type = "N", sample_name_converter = NULL, parse_structure = FALSE )
fp |
File path of the pGlycoQuant result file. |
sample_info |
File path of the sample information file (csv), or a sample information data.frame/tibble. |
quant_method |
Quantification method. Either "label-free" or "TMT". |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
sample_name_converter |
A function to convert sample names from file paths.
The function should take a character vector of old sample names
and return new sample names.
Note that sample names in |
parse_structure |
Logical. Whether to parse glycan structures.
If |
An glyexp::experiment() object.
You should use the "Quant.spectra.list" file in the pGlycoQuant result folder. Files from pGlyco3 result folder are not needed. For instructions on how to use pGlyco3 and pGlycoQuant, please refer to the manual: pGlycoQuant.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
peptide_site: integer, site of glycosylation on peptide
protein: character, protein accession (after protein inference)
protein_site: integer, site of glycosylation on protein (after protein inference)
gene: character, gene name (symbol) (after protein inference)
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glycan_structure: glyrepr::glycan_structure(), glycan structures (if parse_structure = TRUE).
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the RawName column in the pGlyco3 result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
pGlyco3 reports protein groups. That is, shared glycopeptides are reported as a group of proteins separated by ";". This function automatically performs protein inference using the parsimony method to find the leader proteins. This ensures each glycopeptide is uniquely mapped to a single protein, gene, and glycosite.
pGlyco3 performs quantification on the PSM level. This level of information is too detailed for most downstream analyses. This function aggregate PSMs into glycopeptides through summation. For each glycopeptide (unique combination of "peptide", "peptide_site", "protein", "protein_site", "gene", "glycan_composition", "glycan_structure"), we sum up the quantifications of all PSMs that belong to this glycopeptide.
pGlyco3 reports a "plausible structure" for each glycan.
You can set parse_structure = TRUE to parse these structures into a "glycan_structure"
column as a glyrepr::glycan_structure() vector.
However, please take caution with these structures,
because pGlyco3 does not have strict quality control on glycan structure annotations.
glyexp::experiment(), glyrepr::glycan_composition(),
glyrepr::glycan_structure()
StrucGP is a software for intact glycopeptide identification.
As StrucGP doesn't support quantification,
this function returns an glyexp::experiment() object with a binary (0/1) expression matrix
indicating whether each glycopeptide was identified in each sample.
read_strucgp(fp, sample_info = NULL, glycan_type = "N", parse_structure = TRUE)read_strucgp(fp, sample_info = NULL, glycan_type = "N", parse_structure = TRUE)
fp |
File path of the StrucGP result file. |
sample_info |
File path of the sample information file (csv),
or a sample information data.frame/tibble. If |
glycan_type |
Glycan type. One of "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", or "O-Glc". Default is "N". |
parse_structure |
Logical. Whether to parse glycan structures.
If |
An glyexp::experiment() object with a binary (0/1) expression matrix,
where 1 indicates the glycopeptide was identified in that sample and 0 indicates it was not.
The following columns could be found in the variable information tibble:
peptide: character, peptide sequence
protein: character, protein accession
gene: character, gene name (symbol)
protein_site: integer, site of glycosylation on protein
glycan_composition: glyrepr::glycan_composition(), glycan compositions.
glycan_structure: glyrepr::glycan_structure(), glycan structures (if parse_structure = TRUE).
The sample information file should be a csv file with the first column
named sample, and the rest of the columns being sample information.
The sample column must match the file_name column in the StrucGP result file,
although the order can be different.
You can put any useful information in the sample information file. Recommended columns are:
group: grouping or conditions, e.g. "control" or "tumor",
required for most downstream analyses
batch: batch information, required for batch effect correction
glyexp::experiment(), glyrepr::glycan_composition(),
glyrepr::glycan_structure()