filter_obs() and filter_var() now consistently drop unused factor levels by default, matching the documented behavior.as_pseudo_glycome() to convert glycoproteomics data into pseudo-glycomics format.%>% with native pipe |> throughout the package. The magrittr package is no longer a dependency.set_meta_data() for exp_type changes.standardize_variable() has been simplified and no longer requires internet access:
fasta and taxid parameters were removed.<site> token is no longer supported. Use protein_site column directly instead.UniProt.ws package.real_experiment and real_experiment2 to use new implementation of glyrepr_structure and glyrepr_composition in the development version of glyrepr.standardize_variable() failing for many proteins due to UniProt API URL length limits. Queries are now batched to stay within limits.standardize_variable() now uses "X" for site when protein_site or peptide_site is NA.standardize_variable() to standardize the variable names of an experiment. This function makes variable names more human-readable and meaningful.real_experiment and real_experiment2 now have standardized variable names.real_experiment that the peptide column contains "J" instead of "N" for glycosites.filter_obs() and filter_var() now drop unused levels for columns used for filtering. For example, filter_obs(exp, group %in% c("A", "B")) will drop all other levels from the group column in sample_info, keeping only "A" and "B". This behavior makes downstream analysis easier.check_col_types = TRUE, even if coerce_col_types = FALSE. Now, type coercion is performed as long as coerce_col_types = TRUE.experiment() for type coercion and checking.merge().dplyr::join(), which also returns an empty tibble when no rows are left after joining.select_obs() and select_var() to explain that the sample or variable column will always be kept.filter_obs() and filter_var() no longer throw an error when no samples or variables are left after filtering. Instead, they return an empty experiment. This change is to align with the behavior of dplyr::filter(), which also returns an empty tibble when no rows are left after filtering.count_xxx() functions, as their functionality is now covered by summarize_experiment().summarize_experiment() now count both total and per-sample numbers. For example, both the number of compositions identified in total (total_composition) and the average number of compositions detected per sample (composition_per_sample) are in the returned tibble.summarize_experiment() that it fails when required columns are missing.summarize_experiment() to summarize an experiment.glycan_type argument of experiment() and set_glycan_type() now accepts more values, including "N", "O-GalNAc", "O-GlcNAc", "O-Man", "O-Fuc", "O-Glc", and NULL.This version is a big update for experiment(). We have made this function more flexible, robust, and user-friendly.
The breaking changes we introduced in this version have a broad impact on many glycoverse packages. So we recommend to update all packages to the latest versions.
experiment() now checks whether certain columns are present in var_info. For "glycomics" experiments, glycan_composition is required. For "glycoproteomics" experiments, protein, protein_site, and glycan_composition are required. If any required column is missing, an error will be raised.experiment() now intelligently coerce common column types to improve user experience. For example, it will convert character to factor for the "group" column in sample_info. This behavior can be controlled by the coerce_col_types argument.experiment() now checks the column types for some important columns, such as protein, protein_site, glycan_composition, etc. The checking is performed after column coercion (if coerce_col_types is TRUE), and only a message is printed if some column type conventions are violated. This behavior can be controlled by the check_col_types argument.glycan_type can be NULL. Also, type coercion and checking are skipped for this type.experiment() is only an expression matrix. If not provided, sample_info and var_info will be automatically generated based on the column and row names of the expression matrix. exp_type will be set to "others" and glycan_type will be set to NULL. This gives users more flexibility to create experiment() objects.exp_type of experiment() now.toy_experiment now has exp_type "others".real_experiment and real_experiment2 now have factors for the "group" column in sample_info.set_meta_data(), set_exp_type(), and set_glycan_type() now check the meta data.get_meta_data() now returns all meta data fields if x is NULL.experiment() object now includes the experiment type in the title.experiment() object from scratch.:= from rlang to fix "could not find function "%||%"" bug in as_se() and from_se() in some systems.real_experiment2, a glycomics example experiment.from_se() and as_se() to convert experiment() to and from SummarizedExperiment() objects.split() to split an experiment into a list of experiments.merge() and split() to the getting started vignette.toy_experiment: "N3N2" should be "H3N2".select_var().toy_experiment is no longer a function, but a data object. Instead of using toy_experiment() to get the example experiment, use toy_experiment directly.left_join_obs(), left_join_var(), inner_join_obs(), inner_join_var(), semi_join_obs(), semi_join_var(), anti_join_obs(), and anti_join_var(). These functions are useful for adding new information to experiment() from tibbles.real_experiment data object. This is derived from a real-world N-glycoproteome dataset.merge().sample_info and var_info are now displayed in the console when printing the experiment() object.set_exp_type() and set_glycan_type() now validate the input values.merge() function to merge two experiments.experiment().count_compositions(): The number of glycan compositions.count_structures(): The number of glycan structures.count_peptides(): The number of peptides.count_glycopeptides(): The number of unique combinations of peptides, sites, and glycans.count_glycoforms(): The number of unique combinations of proteins, sites, and glycans.count_proteins(): The number of proteins.count_glycosites(): The number of unique combinations of proteins and sites.add_glycan_descriptions(), add_struc_descriptions(), and add_comp_descriptions().
These functions are moved to the glymotif package.