3.2. Spectrum and Chromatogram

The spectrum class offers a python object for mass spectrometry data. The spectrum object holds the basic information of the spectrum and offers methods to interrogate properties of the spectrum. Data, i.e. mass over charge (m/z) and intensity decoding is performed on demand and can be accessed via their properties, e.g. peaks.

The Spectrum class is used in the Reader class. There each spectrum is accessible as a spectrum object.

Theoretical spectra can also be created using the setter functions. For example, m/z values, intensities, and peaks can be set by the corresponding properties: pymzml.spec.Spectrum.mz, pymzml.spec.Spectrum.i, pymzml.spec.Spectrum.peaks.

Similar to the spectrum class, the chromatogram class allows interrogation with profile data (time, intensity) in an total ion chromatogram.

3.2.1. Spectrum

class pymzml.spec.Spectrum(element=<Element ''>, measured_precision=5e-06)[source]

Spectrum class which inherits from class pymzml.spec.MS_Spectrum

Parameters

element (xml.etree.ElementTree.Element) – spectrum as xml element

Keyword Arguments

measured_precision (float) – in ppm, i.e. 5e-6 equals to 5 ppm.

__getitem__(accession)[source]

Access spectrum XML information by tag name

Parameters

accession (str) – name of the XML tag

Returns

value of the XML tag

Return type

value (float or str)

__add__(other_spec)[source]

Adds two pymzml spectra

Parameters

other_spec (Spectrum) – spectrum to add to the current spectrum

Returns

reference to the edited spectrum

Return type

self (Spectrum)

Example:

>>> import pymzml
>>> s = pymzml.spec.Spectrum( measuredPrescision = 20e-6 )
>>> file_to_read = "../mzML_example_files/xy.mzML.gz"
>>> run = pymzml.run.Reader(
...     file_to_read ,
...     MS1_Precision = 5e-6 ,
...     MSn_Precision = 20e-6
... )
>>> for spec in run:
...     s += spec
__sub__(other_spec)[source]

Subtracts two pymzml spectra.

Parameters

other_spec (spec.Spectrum) – spectrum to subtract from the current spectrum

Returns

returns self after other_spec was subtracted

Return type

self (spec.Spectrum)

__mul__(value)[source]

Multiplies each intensity with a float, i.e. scales the spectrum.

Parameters

value (int, float) – value to multiply the intensities with

Returns

returns self after intensities were scaled

by value

Return type

self (spec.Spectrum)

__truediv__(value)[source]

Divides each intensity by a float, i.e. scales the spectrum.

Parameters

value (int, float) – value to divide the intensities by

Returns

returns self after intensities were scaled

by value.

Return type

self (spec.Spectrum)

property ID

Access the native id (last number in the id attribute) of the spectrum.

Returns

native ID of the spectrum

Return type

ID (str)

property TIC

Property to access the total ion current for this spectrum.

Returns

Total Ion Current of the spectrum.

Return type

TIC (float)

estimated_noise_level(mode='median')[source]

Calculates noise threshold for function remove_noise.

Different modes are available. Default is ‘median’

Keyword Arguments

mode (str) – define mode for removing noise. Default = “median” (other modes: “mean”, “mad”)

Returns

estimate noise threshold

Return type

noise_level (float)

extreme_values(key)[source]

Find extreme values, minimal and maximum m/z and intensity

Parameters

key (str) – m/z : “mz” or intensity : “i”

Returns

tuple of minimal and maximum m/z or intensity

Return type

extrema (tuple)

get(acc, default=None)[source]

Mimic dicts get function.

Parameters
  • acc (str) – accession or obo tag to return

  • default (None, optional) – default value if acc is not found

has_overlapping_peak(mz)[source]

Checks if a spectrum has more than one peak for a given m/z value and within the measured precision

Parameters

mz (float) – m/z value which should be checked

Returns

Returns True if a nearby peak is detected, otherwise False

Return type

Boolean (bool)

has_peak(mz2find)[source]

Checks if a Spectrum has a certain peak. Requires a m/z value as input and returns a list of peaks if the m/z value is found in the spectrum, otherwise [] is returned. Every peak is a tuple of m/z and intensity.

Note

Multiple peaks may be found, depending on the defined precisions

Parameters

mz2find (float) – m/z value which should be found

Returns

list of m/z, i tuples

Return type

peaks (list)

Example:

>>> import pymzml
>>> example_file = 'tests/data/example.mzML'
>>> run = pymzml.run.Reader(
...     example_file,
...     MS_precisions =  {
...         1 : 5e-6,
...         2 : 20e-6
...     }
... )
>>> for spectrum in run:
...     if spectrum.ms_level == 2:
...             peak_to_find = spectrum.has_peak(1016.5404)
...             print(peak_to_find)
[(1016.5404, 19141.735187697403)]
highest_peaks(n)[source]

Function to retrieve the n-highest centroided peaks of the spectrum.

Parameters

n (int) – number of highest peaks to return.

Returns

list mz, i tupls with n-highest

Return type

centroided peaks (list)

Example:

>>> run = pymzml.run.Reader(
...     "tests/data/example.mzML.gz",
...      MS_precisions =  {
...         1 : 5e-6,
...         2 : 20e-6
...     }
... )
>>> for spectrum in run:
...     if spectrum.ms_level == 2:
...         if spectrum.ID == 1770:
...             for mz,i in spectrum.highest_peaks(5):
...                print(mz, i)
property i

Returns the list of the intensity values. If the intensity values are encoded, the function _decode() is used to decode the encoded data.

The i property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.

Returns

i (list): list of intensity values from the analyzed spectrum

property id_dict

Access to all entries stored the id attribute of a spectrum.

Returns

key value pairs for all entries in id attribute of a spectrum

Return type

id_dict (dict)

property index

Access the index of the spectrum.

Returns

index of the spectrum

Return type

index (int)

Note

This does not necessarily correspond to the native spectrum ID

property measured_precision

Sets the measured and internal precision

Returns

measured precision (e.g. 5e-6)

Return type

value (float)

property ms_level

Property to access the ms level.

Returns

Return type

ms_level (int)

property mz

Returns the list of m/z values. If the m/z values are encoded, the function _decode() is used to decode the encoded data. The mz property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.

Returns

list of m/z values of spectrum.

Return type

mz (list)

peaks(peak_type)[source]

Decode and return a list of mz/i tuples.

Parameters

peak_type (str) – currently supported types are: raw, centroided and reprofiled

Returns

list or numpy array of mz/i tuples or arrays

Return type

peaks (list or ndarray)

ppm2abs(value, ppm_value, direction=1, factor=1)[source]

Returns the value plus (or minus, dependent on direction) the error (measured precision ) for this value.

Parameters
  • value (float) – m/z value

  • ppm_value (int) – ppm value

Keyword Arguments
  • direction (int) – plus or minus the considered m/z value. The argument direction should be 1 or -1

  • factor (int) – multiplication factor for the imprecision. The argument factor should be bigger than 0

Returns

imprecision for the given value

Return type

imprecision (float)

property precursors

List the precursor information of this spectrum, if available. :returns: list of precursor ids for this spectrum. :rtype: precursor(list)

reduce(peak_type='raw', mz_range=(None, None))[source]

Remove all m/z values outside the given range.

Parameters

mz_range (tuple) – tuple of min, max values

Returns

list of mz, i tuples in the given range.

Return type

peaks (list)

remove_noise(mode='median', noise_level=None, signal_to_noise_threshold=1.0)[source]

Function to remove noise from peaks, centroided peaks and reprofiled peaks.

Keyword Arguments
  • mode (str) – define mode for removing noise. Default = “median”

  • modes ((other) –

    “mean”, “mad”)

    noise_level (float): noise threshold signal_to_noise_threshold (float): S/N threshold for a peak to be accepted

Returns

Returns a list with tuples of m/z-intensity pairs above the noise threshold

Return type

reprofiled peaks (list)

property scan_time

Property to access the retention time and retention time unit. Please note, that we do not assume the retention time unit, if it is not correctly defined in the mzML. It is set to ‘unicorns’ in this case.

Returns

scan_time_unit (str):

Return type

scan_time (float)

scan_time_in_minutes()[source]

Property to access the retention time in minutes. If the retention time unit is defined within the mzML, the retention time is converted into minutes and returned without the unit.

Returns

Return type

scan_time (float)

property selected_precursors

Property to access the selected precursors of a MS2 spectrum. Returns a list of dicts containing the precursors mz and, if available intensity and charge for each precursor.

Returns

Return type

selected_precursors (list)

set_peaks(peaks, peak_type)[source]

Assign a custom peak array of type peak_type

Parameters
  • peaks (list or ndarray) – list or array of mz/i values

  • peak_type (str) – Either raw, centroided or reprofiled

similarity_to(spec2, round_precision=0)[source]

Compares two spectra and returns cosine

Parameters

spec2 (Spectrum) – another pymzml spectrum that is compared to the current spectrum.

Keyword Arguments

round_precision (int) – precision mzs are rounded to, i.e. round( mz, round_precision )

Returns

value between 0 and 1, i.e. the cosine between the

two spectra.

Return type

cosine (float)

Note

Spectra data is transformed into an n-dimensional vector, where m/z values are binned in bins of 10 m/z and the intensities are added up. Then the cosine is calculated between those two vectors. The more similar the specs are, the closer the value is to 1.

property t_mz_set

Creates a set of integers out of transformed m/z values (including all values in the defined imprecision). This is used to accelerate has_peak function and similar.

Returns

set of transformed m/z values

Return type

t_mz_set (set)

transform_mz(value)[source]

pymzml uses an internal precision for different tasks. This precision depends on the measured precision and is calculated when spec.Spectrum.measured_precision is invoked. transform_mz can be used to transform m/z values into the internal standard.

Parameters

value (float) – m/z value

Returns

to internal standard transformed mz value this value can be used to probe internal dictionaries, lists or sets, e.g. pymzml.spec.Spectrum.t_mz_set()

Return type

transformed value (float)

Example

>>> import pymzml
>>> run = pymzml.run.Reader(
...     "test.mzML.gz" ,
...     MS_precisions =  {
...         1 : 5e-6,
...         2 : 20e-6
...     }
... )
>>>
>>> for spectrum in run:
...     if spectrum.ms_level == 2:
...         peak_to_find = spectrum.has_deconvoluted_peak(
...             1044.5804
...         )
...         print(peak_to_find)
[(1044.5596, 3809.4356300564586)]
property transformed_mz_with_error

Returns transformed m/z value with error

Returns

Transformed m/z values in dictionary

{

m/z_with_error : [(m/z,intensity), …], …

}

Return type

tmz values (dict)

property transformed_peaks

m/z value is multiplied by the internal precision.

Returns

Returns a list of peaks (tuples of mz and intensity). Float m/z values are adjusted by the internal precision to integers.

Return type

Transformed peaks (list)

3.2.2. Chromatogram

class pymzml.spec.Chromatogram(element, measured_precision=5e-06, param=None)[source]

Class for Chromatogram access and handling.

peaks()[source]

Return the list of peaks of the spectrum as tuples (time, intensity).

Returns

list of time, intensity tuples

Return type

peaks (list)

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(
...     spectra.mzMl.gz,
...     MS_precisions =  {
...         1 : 5e-6,
...         2 : 20e-6
...     }
... )
>>> for entry in run:
...     if isinstance(entry, pymzml.spec.Chromatogram):
...         for time, intensity in entry.peaks:
...             print(time, intensity)

Note

The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.

property profile

Returns the list of peaks of the chromatogram as tuples (time, intensity).

Returns

list of time, i tuples

Return type

peaks (list)

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(
...     spectra.mzMl.gz,
...     MS_precisions = {
...         1 : 5e-6,
...         2 : 20e-6
...     }
... )
>>> for entry in run:
...     if isinstance(entry, pymzml.spec.Chromatogram):
...         for time, intensity in entry.peaks:
...             print(time, intensity)

Note

The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.

property time

Returns the list of time values. If the time values are encoded, the function _decode() is used to decode the encoded data.

The time property can also be set, e.g. for theoretical data. However, it is recommended to use the profile property to set time and intensity tuples at same time.

Returns

list of time values from the analyzed chromatogram

Return type

time (list)

3.2.3. MS_Spectrum

class pymzml.spec.MS_Spectrum[source]

General spectrum class for data handling.

get_element_by_name(name)[source]

Get element from the original tree by it’s unit name.

Parameters

name (str) – unit name of the mzml element.

Keyword Arguments

obo_version (str, optional) – obo version number.

get_element_by_path(hooks)[source]

Find elements in spectrum by its path.

Parameters

hooks (list) – list of parent elements for the target element.

Returns

list of XML objects found in the path

Return type

elements (list)

Example

To access cvParam in scanWindow tag:

>>> spec.get_element_by_path(['scanList', 'scan', 'scanWindowList',
...     'scanWindow', 'cvParam'])
property measured_precision

Set the measured and internal precision.

Returns

measured Precision (e.g. 5e-6)

Return type

value (float)

to_string(encoding='latin-1', method='xml')[source]

Return string representation of the xml element the spectrum was initialized with.

Keyword Arguments
  • encoding (str) –

    text encoding of the returned string.

    Default is latin-1.

  • method (str) –

    text format of the returned string.

    Default is xml, alternatives are html and text.

Returns

xml string representation of the spectrum.

Return type

element (str)