2. Quick Start

2.1. Parsing a mzML file and setting measured precision

#!/usr/bin/env python
import sys
import pymzml


def main(mzml_file):
    """
    Basic example script to demonstrate the usage of pymzML. Requires a mzML
    file as first argument.

    usage:

        ./simple_parser.py <path_to_mzml_file>

    Note:

        This script uses the new syntax with the MS level being a property of
        the spectrum class ( Spectrum.ms_level ). The old syntax can be found in
        the script simple_parser_v2.py where the MS level can be queried as a key
        (Spectrum['ms level'])


    """
    run = pymzml.run.Reader(mzml_file)
    for n, spec in enumerate(run):
        print(
            "Spectrum {0}, MS level {ms_level} @ RT {scan_time:1.2f}".format(
                spec.ID, ms_level=spec.ms_level, scan_time=spec.scan_time_in_minutes()
            )
        )
    print("Parsed {0} spectra from file {1}".format(n, mzml_file))
    print()


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(main.__doc__)
        exit()
    mzml_file = sys.argv[1]
    main(mzml_file)

2.2. Seeking in a mzML file

One of the features of pymzML is the ability to (create) and read indexed gzip which allows mzML file sizes to reach the levels of the original RAW format. The interface to random access into a mzML file is implemented by the magic get function in pymzMLs run class.

Alternatively, pymzML can also rapidly seek into any uncompressed mzML file, no matter if an index was included into the file or not.

#!/usr/bin/env python
import pymzml

run = pymzml.run.Reader( 'tests/data/BSA1.mzML.gz' )
spectrum_with_id_2540 = run[ 2540 ]

2.3. Reading mzML indices with a custom regular expression

When reading mzML files with indices wich is not an integer or contains “scan=1” or similar, you can set a custom regex to parse the index when initializing the reader.

Say for example you have an index as in the example file Manuels_customs_ids.mzML:

<offset idRef=”ManuelsCustomID=1 diesdas”>4026</offset>

#!/usr/bin/env python
import pymzml
import re

index_re = re.compile(
    b'.*idRef="ManuelsCustomID=(?P<ID>.*) diesdas">(?P<offset>[0-9]*)</offset>'
)
run = pymzml.run.Reader(your_file_path, index_regex=index_re)
spec_1 = run[1]

The regular expression has to contain a group called ID and a group called offset. Also be aware that your regex need to be a byte string.