3.3. Utils

3.3.1. GSGW

class pymzml.utils.GSGW.GSGW(file=None, max_idx=10000, max_idx_len=8, max_offset_len=8, output_path='./test.dat.igzip', comp_str=-1)[source]

Generalized Gzip writer class with random access to indexed offsets.

Keyword Arguments:
  • file (string) – Filename for the resulting file
  • max_idx (int) – max number of indices which can be saved in this file
  • max_idx_len (int) – maximal length of the index in bytes, must be between 1 and 255
  • max_offset_len (int) – maximal length of the offset in bytes
  • output_path (str) – path to the output file

Allocate ‘self.max_index_num’ bytes of length ‘self.max_idx_len’ in the header for inserting the index later on.


Write data into file-stream.

Parameters:data (str) – uncompressed data
_write_gen_header(Index=False, FLAGS=None)[source]

Write a valid gzip header with creation time, user defined flag fields and allocated index.

Keyword Arguments:
  • Index (bool) – whether to or not to write an index into this header.
  • FLAGS (list, optional) – list of flags (FTEXT, FHCRC, FEXTRA, FNAME) to set for this header.

byte offset of the file pointer

Return type:

offset (int)


Convert and write the identifier into output file.

Parameters:identifier (str or int) – identifier to write into index

Convert and write offset to output file.

Parameters:offset (int) – offset which will be formatted and written into file index
add_data(data, identifier)[source]

Create a new gzip member with compressed ‘data’ indexed with ‘index’.

  • data (str) – uncompressed data to write to file
  • index (str or int) – unique index for the data

Returns the encoding used for this file


Output filehandler


Only called after all the data is written, i.e. all calls to add_data() have been done.

Seek back to the beginning of the file and write the index into the allocated comment bytes (see _write_gen_header(Index=True)).

3.3.2. GSGR

class pymzml.utils.GSGR.GSGR(file=None)[source]

Generalized Gzip reader class which enables random access in files written with the GSGW class.

Keyword Arguments:
 file (str) – path to file to read

Check if file is a gzip file.


Read and save compression method, bitflags, changetime, compression speed and os.


Read and save offset dict from indexed gzip file


Read the content of the in File in binary mode

Keyword Arguments:
 size (int, optional) – number of bytes to read, -1 for everything
Returns:parsed bytes from input file
Return type:data (bytes)

Read and return the data block with the unique index index

Parameters:index (int or str) – identifier associated with a specific block
Returns:indexed text block as string
Return type:data (str)

Seek to byte offset in input file.

Parameters:offset (int) – byte offset to seek to in FileIn


..      class SQLiteDatabase(object):
..              """
..              Example implementation of a database Connector,
..              which can be used to make run accept paths to
..              sqlite db files.


..      def _open(self, path):
..              if path.endswith('.gz'):
..                      if self._indexed_gzip(path):
..                              self.file_handler = indexedGzip.IndexedGzip(path, self.encoding)
..                      else:
..                              self.file_handler = standardGzip.StandardGzip(path, self.encoding)
..              # Insert a new condition to enable your new fileclass
..              elif path.endswith('.db'):
..                      self.file_handler = utils.SQLiteConnector.SQLiteDatabase(path, self.encoding)
..              else:
..                      self.file_handler     = standardMzml.StandardMzml(path, self.encoding)
..              return self.file_handler