auto_clean() is now out-of-box without needing to install other packages (#15).impute_min_prob() to avoid the dependency on imputeLCMD package (#16).We have made significant updates to auto_clean(). QC-related behaviors are now removed from auto_impute(), auto_normalize(), and auto_remove(), because we realized that depending on CVs in QC samples to determine the imputation or normalization strategy is not robust.
info parameter in auto_xxx() functions is removed. This should not be a problem because this parameter was only used internally (#13).auto_impute() now uses a different strategy. When sample size < 30, use impute_min_prob(). When 30 <= sample size < 100, use impute_bpca() for glycomics data and impute_min_prob() for glycoproteomics data. When sample size >= 100, use impute_miss_forest() for glycomics data and impute_bpca() for glycoproteomics data (#8).auto_impute() and auto_normalize() do not rely on QC samples to determine the strategy (#8, #9).auto_remove() does not take into account the QC samples anymore (a1cb616).qc_name argument in auto_clean(), auto_impute(), auto_normalize(), and auto_remove() is deprecated (#8, #9, #10).to_try argument in auto_impute() and auto_normalize() is deprecated. impute_to_try and normalize_to_try arguments in auto_clean() are also deprecated (#8, #9).auto_impute() and auto_normalize() now support fallbacks for experiments with "others" type (#8, #9).batch_col argument in auto_clean() is ignored (#12).auto_xxx() functions (#14).transform_clr() and transform_alr() now align with the methods described in DOI: 10.1038/s41467-025-56249-3.auto_normalize() now only performs total abundance normalization for glycomics data.auto_coda() to use the CoDA strategy in glycowork for glycomics data.transform_clr() for Centered Log-Ratio (CLR) transformation.transform_alr() for Additive Log-Ratio (ALR) transformation.auto_clean() for QC-related behaviors.method parameter to correct_batch_effect() supporting limma method for batch correction.correct_batch_effect() for glyexp_experiment method.add_site_seq().aggregate() now calls glyexp::standardize_variable() after aggregation to ensure meaningful variable names.qc_name argument and group_col argument were ignored in auto_clean().qc_name argument in auto_xxx() functions now can be NULL.plot_missing_heatmap(): Binary heatmap of missing value patterns.plot_missing_bar(): Bar plot of missing proportions by sample or variable.plot_tic_bar(): Total intensity (TIC) bar plot by sample.plot_rank_abundance(): Protein rank abundance plot.plot_int_boxplot(): Log2-intensity boxplots by sample, with optional grouping.plot_rle(): Relative Log Expression (RLE) boxplots for detecting sample-wise bias.plot_cv_dent(): CV density plot, with optional stratified groups.plot_batch_pca(): PCA score plot colored by batch.plot_rep_scatter(): Scatter plots of replicate sample pairs with $R^2$ values.pheatmap, ggplotify, factoextra, and patchwork to Suggests to support enhanced QC visualizations.auto_xxx() functions.auto_clean() has been redesigned to be more flexible and robust. It now calls auto_normalize(), auto_remove(), auto_impute(), auto_aggregate(), and auto_correct_batch_effect() in sequence, depending on the experiment type.auto_normalize() to automatically normalize the data.auto_remove() to automatically remove variables with too many missing values.auto_impute() to automatically impute the missing values.auto_aggregate() to automatically aggregate the data.auto_correct_batch_effect() to automatically correct the batch effects.remove_xxx() functions now print a message about the number and proportion of variables removed.seed argument to impute_miss_forest() to make the imputation results reproducible.normalize_median() now issues a warning if any sample has a median value of 0, producing all NaNs in the result.correct_batch_effect() now uses a new method to detect group-batch confounding. It uses Cramer's V to measure the strength of the association between batch and group variables.correct_batch_effect() examples.remove_missing_variables() to remove_rare().glyclean are generic now. This makes it easier to extend glyclean to other data types.remove_low_var() for removing variables with low variance.remove_low_cv() for removing variables with low coefficient of variation.remove_constant() for removing constant variables.remove_low_expr() for removing variables with low expression or abundance.glyexp 0.10.0.aggregate() now has a new logic for aggregating glycoproteomics data. Instead of dropping all other columns, aggregate() now keeps columns intelligently. Common columns including "gene" will be kept in this way. This new logic has an important implication: columns added by functions like glymotif::add_motif_lgl() or glydet::add_meta_properties() will be kept.aggregate() when the user tries to aggregate to a level demanding structure but the structure column is missing.sva package is installed in correct_batch_effect().tibble and glyexp.glyexp 0.8.0.scales to Imports.aggregate().aggregate() where it falled on experiments returned by most glyread functions.add_site_seq() function to add site sequences to a glycopeptide experiment.infer_protein() function to resolve multiple protein assignments for glycopeptides.adjust_protein() function to remove protein expression from glycopeptide expression.aggregate() from x to exp to be consistent with other functions.glyexp_experiment objects.by parameter in data processing functions now accepts factors in addition to column names,
enabling direct use with matrix inputs.correct_batch_effect() and detect_batch_effect() when column
names are not found in sample information.batch or group arguments
with matrix inputs.by argument in filter_missing_variable().auto_clean() now detects batch effects before batch correction.to_level argument is removed from auto_clean().