3.5. File Access

pymzML offers support for different kinds of mzML files. The following classes are wrappers for access of different types of mzML files, which allows the implementation of file type specific search and data retrieving algorithms. An explanation of how to implement your own file class can be found in the advanced usage section.

3.5.1. File Interface

class pymzml.file_interface.FileInterface(path, encoding, build_index_from_scratch=False, index_regex=None)[source]

Interface to different mzML formats.

__getitem__(identifier)[source]

Access the item with id ‘identifier’ in the file.

Parameters

identifier (str) – native id of the item to access

Returns

text associated with the given identifier

Return type

data (str)

__init__(path, encoding, build_index_from_scratch=False, index_regex=None)[source]

Initialize a object interface to mzML files.

Parameters
  • path (str) – path to the mzML file

  • encoding (str) – encoding of the file

_indexed_gzip(path)[source]

Check if the given file is an indexed gzip file or not.

Parameters

path (str) – path to the file

Returns

True if path is a gzip file with index, else False

Return type

bool

_open(path_or_file)[source]

Open a file like object resp. a wrapper for a file like object.

Parameters

path (str) – path to the mzml file

Returns

instance of StandardGzip, IndexedGzip or StandardMzml, based on the file ending of ‘path’

Return type

file_handler

read(size=- 1)[source]

Read binary data from file handler.

Keyword Arguments
  • size (int) – Number of bytes to read from file, -1 to

  • to end of file (read) –

Returns

byte string with defined size of the input data

Return type

data (str)

3.5.2. File Classes

3.5.2.1. mzML

class pymzml.file_classes.standardMzml.StandardMzml(path, encoding, build_index_from_scratch=False, index_regex=None)[source]
__getitem__(identifier)[source]

Access the item with id ‘identifier’.

Either use linear, binary or interpolated search.

Parameters

identifier (str) – native id of the item to access

Returns

text associated with the given identifier

Return type

data (str)

__init__(path, encoding, build_index_from_scratch=False, index_regex=None)[source]

Initalize Wrapper object for standard mzML files.

Parameters
  • path (str) – path to the file

  • encoding (str) – encoding of the file

Retrieve spectrum for a given spectrum ID using binary jumps

Parameters

target_index (int) – native id of the spectrum to access

Returns

pymzML spectrum

Return type

Spectrum (pymzml.spec.Spectrum)

_build_index(from_scratch=False)[source]

Build an index.

A list of offsets to which a file pointer can seek directly to access a particular spectrum or chromatogram without parsing the entire file.

Parameters

from_scratch (bool) – Whether or not to force building the index from scratch, by parsing the file, if no existing index can be found.

Returns

A file-like object used to access the indexed content by seeking to a particular offset for the file.

_build_index_from_scratch(seeker)[source]

Build an index of spectra/chromatogram data with offsets by parsing the file.

Use linear interpolation search to find spectra faster.

Parameters

target_index (str or int) – native id of the item to access

Keyword Arguments

chunk_size (int) – size of the chunk to read in one go in kb

_read_extremes()[source]

Read min and max spectrum ids. Required for binary jumps.

Returns

list of tuples containing spec_id and file_offset

Return type

seek_list (list)

_search_linear(seeker, index, chunk_size=8)[source]

Fallback to linear search if interpolated search fails.

read(size=- 1)[source]

Read binary data from file handler.

Keyword Arguments

size (int) – Number of bytes to read from file, -1 to read to end of file

Returns

byte string of len size of input data

Return type

data (str)

3.5.2.2. Gzip

class pymzml.file_classes.standardGzip.StandardGzip(path, encoding)[source]
__getitem__(identifier)[source]

Access the item with id ‘identifier’ in the file by iterating the xml-tree.

Parameters

identifier (str) – native id of the item to access

Returns

text associated with the given identifier

Return type

data (str)

__init__(path, encoding)[source]

Initalize Wrapper object for gzipped mzML files.

Parameters
  • path (str) – path to the file

  • encoding (str) – encoding of the file

_build_index()[source]

Cant build index for standard gzip files

read(size=- 1)[source]

Read binary data from file handler.

Keyword Arguments

size (int) – Number of bytes to read from file, -1 to read to end of file

Returns

byte string of len size of input data

Return type

data (str)

3.5.2.3. iGzip

class pymzml.file_classes.indexedGzip.IndexedGzip(path, encoding)[source]
__getitem__(identifier)[source]

Access the item with id ‘identifier’ in the file.

Parameters

identifier (str) – native id of the item to access

Returns

text associated with the given identifier

Return type

data (str)

__init__(path, encoding)[source]

Initialize Wrapper object for indexed gzipped files.

Parameters
  • path (str) – path to the file

  • encoding (str) – encoding of the file

_build_index()[source]

Use the GSGR class to retrieve the index from the file and save it.

read(size=- 1)[source]

Read binary data from file handler.

Keyword Arguments

size (int) – Number of bytes to read from file, -1 to read to end of file

Returns

byte string of len size of input data

Return type

data (str)