get_anomer_pos() helper to get the anomer position of a monosaccharide.fill_anomer_pos() function to fill the anomer position of a glycan structure with missing anomer information.get_structure_level() now returns one character scalar for a glyrepr_structure vector instead of one value per element. The vector-wide level is "intact", "partial", "topological", or "basic" according to the combined residue and linkage detail of the non-missing structures in the vector (#42).as_glycan_composition() now supports parsing "E" and "L" in the input composition strings as "NeuAc". For example, as_glycan_composition("H5N4F1L1E1") is now correctly parsed as Hex(5)HexNAc(4)Fuc(1)NeuAc(2), with a warning about dropping the sialic acid linkage information (#41).glycan_composition() and as_glycan_composition() cannot handle duplications in the input. For example, as_glycan_composition("Hex(2)Hex(1)HexNAc(2)") is correctly regared as Hex(3)HexNAc(2) now (#40).as_glycan_structure(NA_character_) now creates a missing structure instead of erroring.get_structure_level() now ignores missing structures when determining the vector-wide level, and returns NA_character_ for empty or all-missing structure vectors.reduce_structure_level() preserves missing structures in output.simap() and ``simap_structure()` now skip missing structures like the other smap variants.get_mono_type.glyrepr_composition() now ignores missing composition elements and returns NA_character_ for all-NA composition vectors.dplyr::recode_values() to replace deprecated dplyr::case_match() to prevent warnings from dplyr.We have redesigned the internal implementation of glyrepr_composition and glyrepr_structure. This brought native support for names to glyrepr_structure, and NA values to both glyrepr_structure and glyrepr_composition.
smap(), smap2(), spmap(), simap() and their variants now preserve
names from input glyrepr_structure vectors in their output.glyrepr_structure now formally supports names. All operations on a named glyrepr_structure vectors preserve the names.glyrepr_structure and glyrepr_composition. Any operation on a glyrepr_structure or glyrepr_composition vector with NA values behave intuitively. is.na() now works for these two classes.glycan_composition() now accepts another glyrepr_composition vector as input, returning it as-is.glyrepr_composition and glyrepr_structure now enforce the same monosaccharide type ("concrete" or "generic") within a vector. Mixed types are not allowed anymore. This invariant is enforced both when creating new vectors and when combining existing vectors.glycan_structure() now does not support multiple glyrepr_structure vectors as input anymore. For example, glycan_structure(o_glycan_core_1(), o_glycan_core_2()) is not valid anymore. Please use c(o_glycan_core_1(), o_glycan_core_2()) instead.get_mono_type() now returns a character scalar instead of a character vector for glyrepr_structure and glyrepr_composition.glyrepr_structure with integer(0) and NULL correctly removes all underlying graphs.[[<- is forbidden on glyrepr_structure vectors. Previously, the operation could be performed silently, but resulted in an invalid object.count_mono() now supports counting substituents, with a new argument include_subs.reduce_structure_level() to reduce a glycan structure to a lower resolution level.convert_to_generic() fails with glycan compositions containing substituents.has_linkages() didn't consider reducing end anomers, which influenced the results of get_structure_level().normalize_substituents(). This function does not help outside of glyrepr so we make it an internal function.glycan_composition() cannot accept empty integer vectors now. Therefore, glycan_composition(integer(0)) is not valid anymore.glycan_composition() now checks input types more strictly.as_glycan_composition() handles NA values and empty strings ("") more strictly. Now an error will be raised if NA values or empty strings are passed instead of dropping them silently. This update makes as_glycan_composition() size-stable and consistent with as_glycan_structure().get_structure_level() to get the structure resolution levels of a glycan structure vector.as_glycan_composition() now supports simple composition strings like "H5N2", "H5N4F1S2", "H5N4A1G1", etc.count_mono() now returns total number of monosaccharides when mono is NULL.has_linkages() now has a strict parameter to control the strictness of the check.glycan_composition() now supports !!!.as_glycan_composition().smap().structure_to_iupac() returns incorrect sequences with incorrect backbone or branch order.smap2(), spmap(), and related functions return unexpected results when the input y is a list.glycan_structure(), including the new behavior of vertex and edge order introduced in 0.7.0.convert_mono_type() is now replaced by convert_to_generic(). convert_mono_type() was created when three monosaccharide types existed: "concrete", "generic", and "simple". When "simple" was removed, the old convert_mono_type() seems redundant, as the only valid conversion is from "concrete" to "generic" now. Therefore, we remove this function now and add a more straightforward convert_to_generic().get_structure_graphs() is redesigned.
i parameter is removed, as indexing can be done manually on the input glyrepr_structure vector or on the returned list easily.return_list parameter to control the return type. This parameter makes this function "type-stable".glycan_structure() and as_glycan_structure() now reorder the underlying graphs to be in line with the IUPAC-style sequence. For example, the vertex order of "Gal(b1-3)[GlcNAc(b1-6)]GalNAc(b1-" is always 1. Gal, 2. GlcNAc, 3. GalNAc, and edges b1-3, b1-6, no matter what the original graphs are. Users can assign the indices of vertices and edges easily by printing the structure to console. This update makes glymotif::match_motif() more meaningful.glyrepr_structure has colors now in tibbles when printed to console.glycan_structure() is printed in the console. For example, the "Neu5Ac" part in "Neu5Ac9Ac(a2-" was printed in black. Now it is printed in purple, while the "9Ac" part remains in black.n_glycan_core() now has a "b1" reducing end anomer, not "?1".glycan_structure() to ensure no duplicated linkage positions. For example, "Gal(b1-3)[Fuc(a1-3)]GalNAc(b1-" is invalid now becuase both "Gal" and "Fuc" are linked to "GalNAc" at position 3.glycan_structure() documentation.remove_linkages() now also removes reducing end anomers.n_glycan_core(), o_glycan_core_1(), and o_glycan_core_2() now have "??" anomers when linkage = FALSE.smap_structure(), smap2_structure(), spmap_structure(), and simap_structure() where modifying the structures can create identical structures, but the unique structures are not updated correctly. This automatically fixes a similar bug in remove_linkages().glycan_structure() objects. This information is rarely used in glycomics and glycoproteomics data analysis. It is removed according to the razor principle.as_glycan_structure() now doesn't allow the input IUPAC-condensed strings to omit the anomer information. Previously, something like "Glc(a1-3)GlcNAc" is valid. as_glycan_structure() assumed that the core "GlcNAc" has a "?1-" anomer and added it automatically. The problem is that this behavior was not easily awared by users and might cause confusion. Again, less is more, so we remove it.glyrepr_structure from structure to struct.Glycan structures now support multiple substituents on a single monosaccharide. Substituents are stored as comma-separated strings internally and concatenated in IUPAC format for display.
Glycan compositions now support substituents. The glycan_composition class
can now represent and count substituents alongside monosaccharides.