Pyopenms

Complete mass spectrometry analysis platform. Use for proteomics and metabolomics workflows—feature detection, peptide/protein identification, label-free and isobaric quantification, adduct/accurate-mass annotation, and complex LC-MS/MS pipelines. Supports extensive file formats and algorithms. For simple spectral comparison and small-molecule library matching use matchms.

Published by @K-Dense-AI·0 agent reads / 30d·0 saves·

PyOpenMS

Overview

PyOpenMS provides Python bindings to the OpenMS library for computational mass spectrometry, enabling analysis of proteomics and metabolomics data. Use it to read/write MS file formats, process raw spectra, detect and quantify features, identify peptides and proteins, and run end-to-end LC-MS/MS pipelines.

This skill ships ready-to-run scripts in scripts/ covering the most common high-level workflows. Prefer running a script over writing new code—each is a parameterized CLI tool that handles loading, processing, and export. Drop into the Python API (and the references/) only when no script fits.

Installation

uv pip install pyopenms

Verify (note: __version__ works, but the bundled binary prints a one-line memory-status notice on import that is harmless):

import pyopenms as ms
print(ms.__version__)  # 3.5.0

Scripts (start here)

Run with python scripts/<name>.py --help for full options. All accept standard MS file formats and write featureXML/consensusXML/CSV/mzTab/PNG as appropriate.

Inspect & convert

ScriptWhat it does
inspect_ms_data.pySummarize any mzML/mzXML/featureXML/consensusXML/idXML (counts, RT/m/z ranges, TIC, metadata); optional per-spectrum CSV.
convert_format.pyConvert between mzML/mzXML/MGF with optional MS-level, RT, and intensity filtering.
process_spectra.pyConfigurable signal-processing chain: smoothing (Gauss/SGolay), centroiding (PeakPickerHiRes), normalization, S/N and intensity thresholds.

Feature detection & quantification

ScriptWhat it does
detect_features_metabo.pyUntargeted metabolomics feature finding: MassTraceDetection → ElutionPeakDetection → FeatureFindingMetabo.
detect_features_centroided.pyPeptide/centroided feature detection via FeatureFinderAlgorithmPicked.
align_link_quantify.pyMulti-sample pipeline: detect (or load) features → RT alignment → consensus linking → quant matrix CSV.
consensus_to_matrix.pyconsensusXML → wide intensity matrix + metadata, with optional median/quantile normalization and long format.

Annotation

ScriptWhat it does
detect_adducts.pyGroup adducts/charge variants of the same neutral mass (MetaboliteFeatureDeconvolution).
accurate_mass_search.pyAnnotate features against HMDB by accurate mass (AccurateMassSearchEngine → mzTab/CSV).
export_gnps_sirius.pyExport GNPS FBMN inputs (MGF + quant table) or a SIRIUS .ms file.

Identification

ScriptWhat it does
process_identifications.pyRe-index against FASTA, estimate FDR/q-values, filter (FDR/length/best-per-spectrum), export idXML + CSV.

Chemistry

ScriptWhat it does
mass_calculator.pyMonoisotopic/average mass, charged m/z, formula, and isotope pattern for peptides or empirical formulas.
digest_protein.pyIn-silico protease digestion of FASTA/sequence → theoretical peptides with masses and m/z.
theoretical_spectrum.pyGenerate annotated theoretical fragment spectra (b/y/a/c/x/z, losses) for a peptide.

Targeted & visualization

ScriptWhat it does
extract_chromatograms.pyBuild TIC/BPC and XIC traces for target m/z (CSV + optional plot).
plot_ms_data.pyQuick plots: single spectrum, TIC, 2D feature map, MS1 signal map.

Common script recipes

# Inspect a file
python scripts/inspect_ms_data.py sample.mzML --spectra-csv spectra.csv

# Untargeted metabolomics: features for one sample
python scripts/detect_features_metabo.py sample.mzML --out-csv features.csv

# Full multi-sample quantification study
python scripts/align_link_quantify.py s1.mzML s2.mzML s3.mzML --out-prefix study
python scripts/consensus_to_matrix.py study.consensusXML --out quant.csv --normalize median

# Peptide chemistry
python scripts/mass_calculator.py --peptide "PEPTIDEM(Oxidation)K" --charges 1 2 3 --isotopes 5
python scripts/digest_protein.py proteins.fasta --enzyme Trypsin --missed 2 --out peptides.csv

# Identification post-processing
python scripts/process_identifications.py search.idXML --fasta db.fasta --fdr 0.01 --out filtered.idXML --csv hits.csv

Key 3.5.0 API notes

These changed from older OpenMS releases—older tutorials and code will break:

  • Feature finding: FeatureFinder("centroided") was removed. Use FeatureFinderAlgorithmPicked (proteomics/centroided) or the MassTraceDetection → ElutionPeakDetection → FeatureFindingMetabo pipeline (metabolomics). See detect_features_*.py.
  • idXML I/O: IdXMLFile().load/store require a ms.PeptideIdentificationList() for peptide IDs (a plain Python list raises "can not handle type"). Protein IDs remain a plain list.
  • Adduct decharging: the class is MetaboliteFeatureDeconvolution, and adducts use Elements:Charge:Probability syntax (e.g. H:+:0.4, H-2O-1:0:0.05)—not bracket notation like [M+H]+.
  • DataFrame columns: FeatureMap.get_df() uses lowercase rt/mz (not RT). ConsensusMap provides get_intensity_df() and get_metadata_df().
  • Bundled data caveat: the pip wheel ships HMDBMappingFile.tsv but not HMDB2StructMapping.tsv; accurate_mass_search.py detects this and explains how to supply it.

Core data structures

  • MSExperiment – collection of spectra and chromatograms
  • MSSpectrum / MSChromatogram – a single spectrum / chromatographic trace
  • Feature / FeatureMap – a detected LC-MS peak / collection of features
  • ConsensusMap – features linked across samples (the quant table)
  • PeptideIdentification / ProteinIdentification – search results
  • AASequence / EmpiricalFormula – sequence and formula chemistry

For details: see references/data_structures.md.

Parameter management

Most algorithms expose an OpenMS Param object:

algo = ms.FeatureFindingMetabo()
p = algo.getDefaults()
for key in p.keys():
    print(key.decode(), "=", p.getValue(key), "|", p.getDescription(key))
p.setValue("charge_lower_bound", 1)
algo.setParameters(p)

Export to pandas

fm = ms.FeatureMap(); ms.FeatureXMLFile().load("features.featureXML", fm)
df = fm.get_df()             # columns include lowercase rt, mz, intensity, charge, quality

cm = ms.ConsensusMap(); ms.ConsensusXMLFile().load("study.consensusXML", cm)
intensities = cm.get_intensity_df()   # features x samples
metadata = cm.get_metadata_df()       # rt, mz, charge, quality, ...

Integration with other tools

Pandas (DataFrames), NumPy (peak arrays), scikit-learn (ML), Matplotlib/Seaborn (plots), and downstream tools via export: GNPS (FBMN), SIRIUS, and mzTab.

Resources

  • Official docs (3.5.0): https://pyopenms.readthedocs.io/en/release-3.5.0/
  • OpenMS: https://www.openms.org
  • GitHub: https://github.com/OpenMS/OpenMS

References

  • references/file_io.md – file format handling
  • references/signal_processing.md – signal processing algorithms
  • references/feature_detection.md – feature detection and linking
  • references/identification.md – peptide and protein identification
  • references/metabolomics.md – metabolomics-specific workflows
  • references/data_structures.md – core objects and data structures

Bundled with this artifact

22 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Nanogpt

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

data-science-ml+2
0
SKILL0

Umap Learn

Use UMAP-learn for nonlinear dimensionality reduction, 2D/3D embeddings, clustering preprocessing, supervised or semi-supervised UMAP, DensMAP, AlignedUMAP, and Parametric UMAP workflows.

data-science-ml+2
0
SKILL0

Sympy

Use when you need exact symbolic math in Python — algebra, calculus, equation solving, symbolic linear algebra, or code generation via lambdify/LaTeX. Prefer NumPy or SciPy when floating-point approximations are sufficient.

data-science-ml+2
0