3.3. Utils

3.3.1. GSGW

class pymzml.utils.GSGW.GSGW(file=None, max_idx=10000, max_idx_len=8, max_offset_len=8, output_path='./test.dat.igzip', comp_str=- 1)[source]

Generalized Gzip writer class with random access to indexed offsets.

Keyword Arguments
  • file (string) – Filename for the resulting file

  • max_idx (int) – max number of indices which can be saved in this file

  • max_idx_len (int) – maximal length of the index in bytes, must be between 1 and 255

  • max_offset_len (int) – maximal length of the offset in bytes

  • output_path (str) – path to the output file


Allocate ‘self.max_index_num’ bytes of length ‘self.max_idx_len’ in the header for inserting the index later on.


Write data into file-stream.


data (str) – uncompressed data

_write_gen_header(Index=False, FLAGS=None)[source]

Write a valid gzip header with creation time, user defined flag fields and allocated index.

Keyword Arguments
  • Index (bool) – whether to or not to write an index into this header.

  • FLAGS (list, optional) – list of flags (FTEXT, FHCRC, FEXTRA, FNAME) to set for this header.


byte offset of the file pointer

Return type

offset (int)


Convert and write the identifier into output file.


identifier (str or int) – identifier to write into index


Convert and write offset to output file.


offset (int) – offset which will be formatted and written into file index

add_data(data, identifier)[source]

Create a new gzip member with compressed ‘data’ indexed with ‘index’.

  • data (str) – uncompressed data to write to file

  • index (str or int) – unique index for the data

property encoding

Returns the encoding used for this file

property file_out

Output filehandler


Only called after all the data is written, i.e. all calls to add_data() have been done.

Seek back to the beginning of the file and write the index into the allocated comment bytes (see _write_gen_header(Index=True)).

3.3.2. GSGR

class pymzml.utils.GSGR.GSGR(file=None)[source]

Generalized Gzip reader class which enables random access in files written with the GSGW class.

Keyword Arguments

file (str) – path to file to read


Check if file is a gzip file.


Read and save compression method, bitflags, changetime, compression speed and os.


Read and save offset dict from indexed gzip file

read(size=- 1)[source]

Read the content of the in File in binary mode

Keyword Arguments

size (int, optional) – number of bytes to read, -1 for everything


parsed bytes from input file

Return type

data (bytes)


Read and return the data block with the unique index index


index (int or str) – identifier associated with a specific block


indexed text block as string

Return type

data (str)


Seek to byte offset in input file.


offset (int) – byte offset to seek to in FileIn




..      class SQLiteDatabase(object):
..              """
..              Example implementation of a database Connector,
..              which can be used to make run accept paths to
..              sqlite db files.


..      def _open(self, path):
..              if path.endswith('.gz'):
..                      if self._indexed_gzip(path):
..                              self.file_handler = indexedGzip.IndexedGzip(path, self.encoding)
..                      else:
..                              self.file_handler = standardGzip.StandardGzip(path, self.encoding)
..              # Insert a new condition to enable your new fileclass
..              elif path.endswith('.db'):
..                      self.file_handler = utils.SQLiteConnector.SQLiteDatabase(path, self.encoding)
..              else:
..                      self.file_handler     = standardMzml.StandardMzml(path, self.encoding)
..              return self.file_handler