OpenSWATH - GUI Models

The main models used by the GUI are the PeptideTree and the MSData.DataModel models. Internally, PeptideTree uses ChromatogramTransition to store access to single rows in the tree data structure while MSData.DataModel uses SwathRunCollection to keep track of multiple SWATH-MS runs.

MSData Data model Module

Contains classes that provide access to the raw data

MSData

class openswathgui.models.MSData.DataModel(fdr_cutoff=0.01, only_quantified=True)

Bases: object

The main data model, provides access to all raw data

It stores the references to individual SwathRun objects and can be initialized from a list of files. Each “load” method is responsible for setting the self.runs parameter.

self.runs

list of SwathRun or SqlSwathRun

The MS runs which are handled by this class

self.fdr_cutoff

bool

Selected FDR cutoff

self.only_show_quantified

bool

Whether to only show peptides that are quantified

self.draw_transitions_

bool

Whether to draw individual transitions

getDrawTransitions()
Returns:Whether to draw transitions or not
Return type:bool
getStatus()
Returns:Returns its own status (number of transitions etc.) for the status bar.
Return type:str
get_precursor_tree()

Returns the data models precursor tree structure

Returns a list of ChromatogramTransition root elements (rows) to display in the left side tree view. Each element may contain nested ChromatogramTransition elements (tree elements).

Returns:Root element(s) for the peptide tree
Return type:list of ChromatogramTransition
get_runs()

Returns the list of SwathRun objects of this current data model

Returns:The main content of the class is returned, its list of SwathRun
Return type:list of SwathRun
loadFiles(filenames)

Load a set of chromatogram files (no peakgroup information).

Parameters:filenames (list of str) – List of filepaths containing the chromatograms
loadMixedFiles(rawdata_files, aligned_pg_files, fileType)

Load files that contain raw data files and aligned peakgroup files.

Since no mapping is present here, we need to infer it from the data. Basically, we try to map the column align_runid to the filenames of the input .chrom.mzML hoping that the user did not change the filenames.

Parameters:
  • rawdata_files (list of str) – List of paths to chrom.mzML files
  • aligned_pg_files (list of str) – List of paths to output files of the FeatureAligner
  • fileType (str) – Description of the type of file the metadata file (valid: simple, traml, openswath)
loadSqMassFiles(filenames)
load_from_yaml(yamlfile)

Load a yaml file containing a mapping of chromatogram files and aligned peakgroup files.

Parameters:yamlfile (str) – Filepath to the yaml file for loading
setDrawTransitions(draw_transitions)

Whether to draw individual transitions or not

TreeModels Module

Contains classes that provide access to the hierarhical tree container protein, precursor, peptide and transition level data.

While TreeNode and TreeModel are generic models for trees and nodes, the derived classes PeptideTreeNode and PeptideTree are implementations specific to TAPIR.

TreeNode

class openswathgui.models.TreeModels.TreeNode(parent, row)

Bases: object

Generic model of a tree node

Adopted from http://www.hardcoded.net/articles/using_qtreeview_with_qabstractitemmodel.htm

See PeptideTreeNode for implementation.

TreeModel

class openswathgui.models.TreeModels.TreeModel

Bases: PyQt4.QtCore.QAbstractItemModel

Generic tree model

Adopted from http://www.hardcoded.net/articles/using_qtreeview_with_qabstractitemmodel.htm

See parent class http://qt-project.org/doc/qt-5/QAbstractItemModel.html

See PeptideTree for implementation.

PeptideTreeNode

class openswathgui.models.PeptideTree.PeptideTreeNode(ref, parent, row)

Bases: openswathgui.models.TreeModels.TreeNode

Data model of a node in the left-hand peptide tree in the GUI

PeptideTree

class openswathgui.models.PeptideTree.PeptideTree(rootElements, firstColumnName='Peptide Sequence')

Bases: openswathgui.models.TreeModels.TreeModel

Data model of the underlying hierarchical data model (proteins, precursors, peptides, transitions).

This class represents the data model, see openswathgui.views.PeptideTree.PeptidesTreeView for the view implementation.

columnCount(parent)

Returns how many columns we have

data(index, role)

Get data for a specific index (and role)

Currently supported role is only Qt.DisplayRole (for displaying the tree). The three columns are:

  • Compound name (generally peptide sequence or compound sum formula)
  • Charge
  • Name
Parameters:
  • index (QModelIndex) – Index of the element to be accessed
  • role (Qt::ItemDataRole) – Item role to be used (only Qt.DisplayRole supported)
headerData(section, orientation, role)

Get header data (column header) for a specific index (and role)

The three columns are:
  • Peptide Sequence
  • Charge
  • Name

Note that the user can set the name of the first column name manually in order to accomodate for other data (e.g. metabolomics) where “Peptide Sequence” would not make sense.

set_precursor_tree_structure(data, sortData=True)

Initialize tree structure with data from DataModel.get_precursor_tree

The tree is initialized by giving it a pointer to the root element(s)

Parameters:
  • data (list of ChromatogramTransition) – Root element(s) for the peptide tree
  • sortData (bool) – Whether to sort data

SWATH MS Run Module

Raw chromatographic data is handled using the SwathRunCollection class which can either hold references to mzML or to SqMass data.

SwathRunCollection

class openswathgui.models.SwathRunCollection.SwathRunCollection

Bases: object

A collection of SWATH files

Contains multiple SwathRun objects which each represent one single mass spectrometric injection. It can be initialized in three different ways, either from a set of directories (assuming each directory is one run), a set of files mapped to a run id (multiple files may be mapped to run id) or a simple flat list of chromatogram files.

self.swath_chromatograms

Dictionary mapping of the form { run_id : SwathRun}

getRunIds()

Returns all available run ids

Returns:runlist – A list of all available runs
Return type:list of str
getSwathFile(key)
Parameters:key (str) – The requested run
Returns:run – The run corresponding to the requested run
Return type:SwathRun
getSwathFiles()
Returns:runs – All runs found in this collection
Return type:list of SwathRun or SqlSwathRun
initialize_from_chromatograms(runid_mapping, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Initialize from a set of mapped chromatogram files. There may be multiple chromatogram (chrom.mzML) files mapped to one run id.

Parameters:
  • runid_mapping (dict) – A mapping dictionary of form { run_id : [filename, filename, ...] }
  • precursor_mapping (dict) – An optional mapping of the form { FullPrecursorName : [transition_id, transition_id, ...] }
  • sequences_mapping (dict) – An optional mapping of the form { StrippedSequence : [FullPrecursorName, FullPrecursorName, ...]}
initialize_from_directories(runid_mapping)

Initialize from a directory

This assumes that all .mzML files in the same directory are from the same run. There may be multiple chromatogram (chrom.mzML) files mapped to one run id.

Parameters:runid_mapping ((dict)) – A mapping dictionary of form { run_id : directory }
initialize_from_files(filenames)

Initialize from individual files, setting the runid as increasing integers.

This assumes that each .mzML file is from a separate run.

Parameters:filenames (list of str) – A list of filenames
initialize_from_sql(filenames, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Initialize from a set of sqMass chromatogram files.

Parameters:
  • filenames (list(str)) – A List of files
  • precursor_mapping (dict) – An optional mapping of the form { FullPrecursorName : [transition_id, transition_id, ...] }
  • sequences_mapping (dict) – An optional mapping of the form { StrippedSequence : [FullPrecursorName, FullPrecursorName, ...]}
initialize_from_sql_map(runid_mapping, filenames, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Initialize from a set of sqMass chromatogram files.

Parameters:
  • filenames (list(str)) – A List of files
  • precursor_mapping (dict) – An optional mapping of the form { FullPrecursorName : [transition_id, transition_id, ...] }
  • sequences_mapping (dict) – An optional mapping of the form { StrippedSequence : [FullPrecursorName, FullPrecursorName, ...]}

SqMass

class openswathgui.models.SqlSwathRun.SqlSwathRun(runid, filename, load_in_memory=False, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Data Model for a single sqMass file.

TODO: each file may contain multiple runs!

runid

Current run id

Private Attributes:
  • _run: A SqlDataAccess object
  • _filename: Original filename
  • _basename: Original filename basename
  • _precursor_mapping: Dictionary { FullPrecursorName : [transition_id, transition_id] }
  • _sequences_mapping: Dictionary { StrippedSequence : [FullPrecursorName, FullPrecursorName]}
add_peakgroup_data(precursor_id, leftWidth, rightWidth, fdrscore, intensity, assay_rt)
getTransitionCount()

Get total number of transitions

get_all_peptide_sequences()

Get all (stripped) sequences

get_all_precursor_ids()

Get all precursor ids (full sequence + charge)

get_all_proteins()
get_assay_data(precursor)
get_data_for_precursor(precursor)

Retrieve raw data for a specific precursor - data will be as list of pairs (timearray, intensityarray)

get_data_for_transition(transition_id)

Retrieve raw data for a specific transition

get_id()
get_intensity_data(precursor)
get_precursors_for_sequence(sequence)

Get all precursors mapping to one stripped sequence

get_range_data(precursor)
get_score_data(precursor)
get_sequence_for_protein(protein)
get_transitions_for_precursor(precursor)

Return the transition names for a specific precursor

get_transitions_for_precursor_display(precursor)
remove_precursors(toremove)

Remove a set of precursors from the run (this can be done to filter down the list of precursors to display).

class openswathgui.models.SqlDataAccess.SqlDataAccess(filename)

Bases: object

getDataForChromatogram(myid)

Get data from a single chromatogram

  • compression is one of 0 = no, 1 = zlib, 2 = np-linear, 3 = np-slof, 4 = np-pic, 5 = np-linear + zlib, 6 = np-slof + zlib, 7 = np-pic + zlib
  • data_type is one of 0 = mz, 1 = int, 2 = rt
  • data contains the raw (blob) data for a single data array
getDataForChromatogramFromNativeId(native_id)

Get data from a single chromatogram

  • compression is one of 0 = no, 1 = zlib, 2 = np-linear, 3 = np-slof, 4 = np-pic, 5 = np-linear + zlib, 6 = np-slof + zlib, 7 = np-pic + zlib
  • data_type is one of 0 = mz, 1 = int, 2 = rt
  • data contains the raw (blob) data for a single data array
getDataForChromatograms(ids)

Get data from multiple chromatograms chromatogram

  • compression is one of 0 = no, 1 = zlib, 2 = np-linear, 3 = np-slof, 4 = np-pic, 5 = np-linear + zlib, 6 = np-slof + zlib, 7 = np-pic + zlib
  • data_type is one of 0 = mz, 1 = int, 2 = rt
  • data contains the raw (blob) data for a single data array

MzML File

class openswathgui.models.SwathRun.SwathRun(files, runid=None, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Bases: object

Data model for an individual SWATH injection (may contain multiple mzML files).

This contains the model for all data from a single run (e.g. one panel in the viewer - in reality this could be multiple actual MS runs since in SRM not all peptides can be measured in the same injection or just multiple files generated by SWATH MS.

It abstracts all the interfaces of SingleChromatogramFile, usually all other classes directly communicate with this class.

runid

Current run id

Private Attributes:

_all_swathes: Dictionary of { mz : SingleChromatogramFile }

_files: List of files that are containing data for this run

_in_memory: Whether data should be held in memory

_range_mapping: Dictionary of { precursor_id : [leftWidth, rightWidth] }

_score_mapping: Dictionary of { precursor_id : FDR_score }

_intensity_mapping: Dictionary of { precursor_id : Intensity }

add_peakgroup_data(precursor_id, leftWidth, rightWidth, fdrscore, intensity, assay_rt)
getTransitionCount()

Aggregate transition count over all files

get_all_peptide_sequences()
get_all_precursor_ids()
get_all_proteins()
get_assay_data(precursor)
get_data_for_precursor(precursor)

Retrieve raw data for a specific precursor (using the correct run).

get_data_for_transition(transition_id)

Retrieve raw data for a specific transition (using the correct run).

get_id()
get_intensity_data(precursor)
get_precursors_for_sequence(sequence)
get_range_data(precursor)
get_score_data(precursor)
get_sequence_for_protein(protein)
get_transitions_for_precursor(precursor)
get_transitions_for_precursor_display(precursor)
remove_precursors(toremove)

Remove a set of precursors from the run (this can be done to filter down the list of precursors to display).

class openswathgui.models.SingleChromatogramFile.SingleChromatogramFile(run, filename, load_in_memory=False, precursor_mapping=None, sequences_mapping=None, protein_mapping={})

Data Model for a single file from one run.

One run may contain multiple mzML files

runid

Current run id

Private Attributes:
  • _run: A pymzml.run.Reader object
  • _filename: Original filename
  • _basename: Original filename basename
  • _precursor_mapping: Dictionary { FullPrecursorName : [transition_id, transition_id] }
  • _sequences_mapping: Dictionary { StrippedSequence : [FullPrecursorName, FullPrecursorName]}
getTransitionCount()

Get total number of transitions

get_all_peptide_sequences()

Get all (stripped) sequences

get_all_precursor_ids()

Get all precursor ids (full sequence + charge)

get_data_for_precursor(precursor)

Retrieve raw data for a specific precursor - data will be as list of pairs (timearray, intensityarray)

get_data_for_transition(transition_id)

Retrieve raw data for a specific transition

get_id()
get_precursors_for_sequence(sequence)

Get all precursors mapping to one stripped sequence

get_sequence_for_protein(protein)
get_transitions_for_precursor(precursor)

Return the transition names for a specific precursor

get_transitions_with_mass_for_precursor(precursor)

Return the transition names prepended with the mass for a specific precursor

ChromatogramTransition Module

ChromatogramTransition

class openswathgui.models.ChromatogramTransition.ChromatogramTransition(name, charge, subelements, peptideSequence=None, fullName=None, datatype='Precursor')

Bases: object

Internal tree structure object representing one row in the in the left side tree.

This is the bridge between the view and the data model

Pointers to objects of ChromatogramTransition are passed to callback functions when the selection of the left side tree changes. The object needs to have store information about all the column present in the rows (PeptideSequence, Charge, Name) which are requested by the PeptideTree model.

Also it needs to know how to access the raw data as well as meta-data for a certain transition. This is done through getData, getLabel etc.

getAssayRT(run)

Get the intensity for a specific run and current precursor

Parameters:run (SwathRun) – SwathRun object which will be used to retrieve data
Returns:The intensity for a specific run and current precursor
Return type:float
getCharge()

Get charge of precursor

Returns:Charge
Return type:int
getData(run)

Get raw data for a certain object

If we have a single precursors or a peptide with only one precursor, we show the same data as for the precursor itself. For a peptide with multiple precursors, we show all precursors as individual curves. For a single transition, we simply plot that transition.

Parameters:run (SwathRun or SqlSwathRun) – SwathRun object which will be used to retrieve data
Returns:Returns the raw data of the chromatograms for a given run. The dataformat is a list of transitions and each transition is a pair of (timearray,intensityarray)
Return type:list of pairs (timearray, intensityarray)
getIntensity(run)

Get the intensity for a specific run and current precursor

Parameters:run (SwathRun) – SwathRun object which will be used to retrieve data
Returns:The intensity for a specific run and current precursor
Return type:float
getLabel(run)

Get the labels for a curve (corresponding to the raw data from getData call) for a certain object.

If we have a single precursors or a peptide with only one precursor, we show the same data as for the precursor itself. For a peptide with multiple precusors, we show all precursors as individual curves. For a single transition, we simply plot that transition.

Parameters:run (SwathRun) – SwathRun object which will be used to retrieve data
Returns:The labels to display for each line in the graph
Return type:list of str
getName()

Get name of precursor

Returns:Name of precursor
Return type:str
getPeptideSequence()
getProbScore(run)

Get the probabilistic score for a specific run and current precursor

Parameters:run (SwathRun) – SwathRun object which will be used to retrieve data
Returns:The probabilistic score for a specific run and current precursor
Return type:float
getRange(run)

Get the data range (leftWidth/rightWidh) for a specific run

Parameters:run (SwathRun) – SwathRun object which will be used to retrieve data
Returns:A pair of floats representing the data range (leftWidth/rightWidh) for a specific run
Return type:list of float
getSubelements()
getType()