The mdt.features Python module

MDT features.

Copyright 1989-2020 Andrej Sali.

MDT is free software: you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with MDT. If not, see <http://www.gnu.org/licenses/>.

Protein features

These features yield a single value for each protein in the alignment. Each feature takes some common arguments:

  • mlib: the mdt.Library to create the feature in.
  • bins: list of bins (see Specification of bins).
  • protein: the protein index on which to evaluate the feature from each group of proteins (individual protein, pairs, triples) selected from the alignment (0 for the first, 1 for the second, 2 for the third). See mdt.Table.add_alignment() for more details.
class mdt.features.XRayResolution(mlib, bins, protein=0, nmr=0.45)[source]

Protein X-ray resolution in angstroms. Proteins with a resolution of -1.00 (generally NMR structures) are actually reported as having a resolution of nmr. This decreases the number of bins required to hold all defined resolutions while still separating NMR from X-ray structures.

class mdt.features.RadiusOfGyration(mlib, bins, protein=0)[source]

Protein radius of gyration in angstroms. The calculation of the center of mass used for this feature is not mass weighted.

class mdt.features.SequenceLength(mlib, bins, protein=0)[source]

Protein sequence length (number of residues).

class mdt.features.HydrogenBondSatisfaction(mlib, bins, protein=0)[source]

Hydrogen bond satisfaction index for a protein. This is the average difference, over all atoms in the protein, between the HydrogenBondDonor value and the atom’s donor valency plus the same for the acceptor, as defined in the hydrogen bond file (see mdt.Library.hbond_classes).

class mdt.features.AlphaContent(mlib, bins, protein=0)[source]

Alpha content of the protein. This is simply the fraction, between 0 and 1, of residues in the first mainchain conformation class (see MainchainConformation).

Protein pair features

These features yield a single value for each pair of proteins in the alignment. Each feature takes some common arguments:

  • mlib: the mdt.Library to create the feature in.
  • bins: list of bins (see Specification of bins).
  • protein1 and protein2: the indexes of proteins in each group of proteins selected from the alignment to evaluate the feature on; each can range from 0 to 2 inclusive. See mdt.Table.add_alignment() for more details.
class mdt.features.SequenceIdentity(mlib, bins, protein1=0, protein2=1)[source]

Fractional sequence identity, between 0 and 1, between two sequences. This is the number of identical aligned residues divided by the length of the shorter sequence.

Residue features

These features yield a single value for each residue in each sequence in the alignment. Each feature takes some common arguments:

  • delta: if non-zero, don’t calculate the feature for the residue position returned by the residue scan - instead, offset it by delta residues in the sequence. Applied before align_delta.
  • align_delta: if non-zero, don’t calculate the feature for the alignment position returned by the residue scan - instead, offset it by align_delta alignment positions. Applied after delta.
  • pos2: if True, force a residue pair scan, and evaluate the feature on the second residue in each pair.
  • mlib, bins, protein: see Protein features. Note that some residue features do not use the bins argument, because they have a fixed number of bins.
class mdt.features.ResidueType(mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]

Residue type (20 standard amino acids, gap, undefined).

class mdt.features.ResidueAccessibility(mlib, bins, protein=0, delta=0, align_delta=0, pos2=False)[source]

Residue solvent accessibility. This is derived from the atomic solvent accessibility; see AtomAccessibility.

class mdt.features.Chi1Dihedral(self, mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]
class mdt.features.Chi2Dihedral
class mdt.features.Chi3Dihedral
class mdt.features.Chi4Dihedral
class mdt.features.PhiDihedral
class mdt.features.PsiDihedral
class mdt.features.OmegaDihedral
class mdt.features.AlphaDihedral

Residue dihedral angle, from -180 to 180 degrees.

class mdt.features.Chi1Class(self, mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]
class mdt.features.Chi2Class
class mdt.features.Chi3Class
class mdt.features.Chi4Class
class mdt.features.Chi5Class
class mdt.features.PhiClass
class mdt.features.PsiClass
class mdt.features.OmegaClass

Residue dihedral class. These classes are defined by MODELLER to group common regions of dihedral space for each residue type.

class mdt.features.MainchainConformation(mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]

Residue mainchain conformation (Ramachandran) class. This is a classification of the residue’s phi/psi angles into classes as defined in Modeller’s modlib/af_mnchdef.lib file and described in Sali and Blundell, JMB (1993) 234, p785. The default classes are A (right-handed alpha-helix), P (poly-proline conformation), B (idealized beta-strand), L (left-handed alpha-helix), and E (extended conformation).

class mdt.features.ResidueGroup(mlib, protein=0, delta=0, align_delta=0, pos2=False, residue_grouping=0)[source]

Residue group.

class mdt.features.SidechainBiso(mlib, bins, protein=0, delta=0, align_delta=0, pos2=False)[source]

Residue average sidechain Biso. A zero average Biso is treated as undefined. If the average of these values over the whole protein is less than 2, each residue’s value is multiplied by 4 π 2.

Residue pair features

These features yield a single value for each pair of residues in each sequence in the alignment. See Protein features for a description of the common arguments.

class mdt.features.ResidueDistance(mlib, bins, protein=0)[source]

Distance between a pair of residues. This is defined as the distance between the ‘special’ atoms in each residue. The type of this special atom can be specified by the distance_atoms argument when creating a mdt.Library object. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.AverageResidueAccessibility(mlib, bins, protein=0)[source]

Average solvent accessibility of a pair of residues. See ResidueAccessibility.

class mdt.features.ResidueIndexDifference(mlib, bins, protein=0, absolute=False)[source]

Difference in sequence index between a pair of residues. This can either be the simple difference (if absolute is False) in which case the feature is asymmetric, or the absolute value (if absolute is True) which gives a symmetric feature.

Aligned residue features

These features yield a single value for residues aligned between two proteins. For each pair of proteins, every alignment position is scanned, and the feature is evaluated for each pair of aligned residues. See Protein pair features for a description of the common arguments.

class mdt.features.PhiDihedralDifference(self, mlib, bins, protein1=0, protein2=1)[source]
class mdt.features.PsiDihedralDifference
class mdt.features.OmegaDihedralDifference

Shortest difference in dihedral angle (in degrees) between a pair of aligned residues.

class mdt.features.NeighborhoodDifference(mlib, bins, protein1=0, protein2=1)[source]

Residue neighborhood difference. This is the average of the distance scores (from a residue-residue scoring matrix) of all aligned residues where the residue in the first sequence is within a cutoff distance of the scanned residue. (This cutoff is set by the distngh argument to mdt.Table.add_alignment().)

class mdt.features.GapDistance(mlib, bins, protein1=0, protein2=1)[source]

Distance, in alignment positions, to the nearest gap. Note that positions which are gapped in both sequences are ignored for the purposes of this calculation (a ‘gap’ is defined as a gap in one sequence aligned with a residue in the other).

Aligned residue pair features

These features yield a single value for each pair of residues aligned between two proteins. For each pair of proteins, each pair of alignment positions is scanned, and the feature is evaluated for each pair of pairs of aligned residues. See Protein pair features for a description of the common arguments.

class mdt.features.ResidueDistanceDifference(mlib, bins, protein1=0, protein2=1)[source]

Distance between two residues in the second protein, minus the distance between the equivalent residues in the first protein. See ResidueDistance. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.AverageNeighborhoodDifference(mlib, bins, protein1=0, protein2=1)[source]

Average residue neighborhood difference for a pair of alignment positions. See NeighborhoodDifference.

class mdt.features.AverageGapDistance(mlib, bins, protein1=0, protein2=1)[source]

Average distance to a gap from a pair of alignment positions. See GapDistance.

Atom features

These features yield a single value for each atom in the first protein in each group of proteins selected from the alignment. Each feature takes some common arguments:

  • pos2: if True, force an atom pair scan, and evaluate the feature on the second atom in each pair.
  • mlib, bins: see Protein features. Note that some atom features do not use the bins argument, because they have a fixed number of bins.
class mdt.features.AtomAccessibility(mlib, bins, pos2=False)[source]

Atom solvent accessibility. This is calculated by the PSA algorithm, and controlled by the surftyp and accessibility_type arguments to mdt.Table.add_alignment(). The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.FractionalAtomAccessibility(mlib, bins, pos2=False)[source]

Fractional atom solvent accessibility, from 0 to 1. This is the atom solvent accessibility (see AtomAccessibility) divided by the volume of the atom, derived from its van der Waals radius. The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.AtomType(mlib, pos2=False)[source]

Type of an atom, as classified by the atom class file. See mdt.Library.atom_classes.

class mdt.features.HydrogenBondDonor(mlib, bins, pos2=False)[source]

Number of hydrogen bond donors. It is defined as the sum, over all atoms within hbond_cutoff (see mdt.Library) of the atom, of their donor valencies as defined in the hydrogen bond file (see mdt.Library.hbond_classes). The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.HydrogenBondAcceptor(mlib, bins, pos2=False)[source]

Number of hydrogen bond acceptors. It is defined as the sum, over all atoms within hbond_cutoff (see mdt.Library) of the atom, of their acceptor valencies as defined in the hydrogen bond file (see mdt.Library.hbond_classes). The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.HydrogenBondCharge(mlib, bins, pos2=False)[source]

Hydrogen bond charge. It is defined as the sum, over all atoms within hbond_cutoff (see mdt.Library) of the atom, of their charges as defined in the hydrogen bond file (see mdt.Library.hbond_classes).

class mdt.features.AtomTable(mlib, bins, table_name, func, pos2=False)[source]

A tabulated atom feature. The feature is simply a table of N floating-point numbers, where N is the number of atoms in the system. This table is provided by a Python function, so can be used to implement user-defined features or to pass in features from other software. A simple example to use the x coordinate as a feature:

def func(aln, struc, mlib, libs):
    return [a.x for a in struc.atoms]
f = mdt.features.AtomTable(mlib, bins, "x coordinate", func)

Atom pair features

These features yield a single value for each pair of atoms in the first protein in each group of proteins selected from the alignment. See Protein features for a description of the common arguments.

class mdt.features.AtomDistance(mlib, bins)[source]

Distance in angstroms between a pair of atoms. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.AtomBondSeparation(mlib, bins, disulfide=False)[source]

Number of bonds between a pair of atoms. For example, two atoms that are directly bonded return ‘1’, while two at opposite ends of an angle return ‘2’. The bonds between atoms in each standard amino acid are derived from the bond class file, so this must be read in first (see mdt.Library.bond_classes). For atoms in different residues, the residues are assumed to be linked by a peptide backbone, and the number of bonds is calculated accordingly. Atoms in different chains, or atoms of types not referenced in the bond class file, are not connected. If disulfide is set to True, disulfide bridges are also considered (if two residues have SG atoms within 2.5 angstroms, they are counted as bonded). If disulfide is set to False (the default) any disulfide bridges are ignored. Either way, no account is taken of patches and other modifications such as terminal oxygens (unless bonds to OXT are explicitly listed in the bond class file). If a pair of atoms is not connected it is placed in the ‘undefined’ bin.

Tuple features

These features yield a single value for each tuple of atoms in the first protein in each group of proteins selected from the alignment. (The set of tuples must first be read into the mdt.Library.) Each feature takes some common arguments:

  • mlib: the mdt.Library to create the feature in.
  • pos2: if True, force a tuple pair scan, and evaluate the feature on the second tuple in each pair.
class mdt.features.TupleType(mlib, pos2=False)[source]

Type of an atom tuple, as classified by the tuple class file. See mdt.Library.tuple_classes.

Tuple pair features

These features yield a single value for each pair of tuples of atoms in the first protein in each group of proteins selected from the alignment. (The set of tuples must first be read into the mdt.Library.) See Protein features for a description of the common arguments.

class mdt.features.TupleType(mlib, pos2=False)[source]

Type of an atom tuple, as classified by the tuple class file. See mdt.Library.tuple_classes.

class mdt.features.TupleDistance(mlib, bins)[source]

Distance in angstroms between the first atom in each of two tuples The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.TupleAngle1(mlib, bins)[source]

Angle (0-180) between the first atom in the first tuple, the first atom in the second tuple, and the second atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.TupleAngle2(mlib, bins)[source]

Angle (0-180) between the second atom in the first tuple, the first atom in the first tuple, and the first atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.TupleDihedral1(mlib, bins)[source]

Dihedral (-180-180) between the second atom in the first tuple, the first atom in the first tuple, the first atom in the second tuple, and the second atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.TupleDihedral2(mlib, bins)[source]

Dihedral (-180-180) between the third atom in the first tuple, the second atom in the first tuple, the first atom in the first tuple, and the first atom in the second tuple. Only works for atom triplets. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.TupleDihedral3(mlib, bins)[source]

Dihedral (-180-180) between the first atom in the first tuple, the first atom in the second tuple, the second atom in the second tuple, and the third atom in the second tuple. Only works for atom triplets. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

Chemical bond features

These features yield a single value for each defined chemical bond, angle or dihedral in the first protein in each group of proteins selected from the alignment. (The definitions of the chemical connectivity must first be read from a bond class file; see the bond_clases, angle_classes and dihedral_classes attributes in mdt.Library.) See Protein features for a description of the common arguments.

class mdt.features.BondType(mlib)[source]

Type of a bond, as classified by the bond class file. See mdt.Library.bond_classes.

class mdt.features.AngleType(mlib)[source]

Type of an angle, as classified by the angle class file. See mdt.Library.angle_classes.

class mdt.features.DihedralType(mlib)[source]

Type of a dihedral, as classified by the dihedral class file. See mdt.Library.dihedral_classes.

class mdt.features.BondLength(mlib, bins)[source]

Length of a bond in angstroms. See mdt.Library.bond_classes. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.Angle(mlib, bins)[source]

Angle (0-180). See mdt.Library.angle_classes. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

class mdt.features.Dihedral(mlib, bins)[source]

Dihedral angle (-180-180). See mdt.Library.dihedral_classes. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).

Group features

These features are used to make combinations of other features. Each feature takes some common arguments:

  • mlib: the mdt.Library to create the feature in.
  • feat1: an existing feature object that will be included in this group.
  • feat2: another existing feature object to include.
  • nbins: the number of bins in this feature.
class mdt.features.Cluster(mlib, feat1, feat2, nbins)[source]

Cluster feature. When evaluated, it evaluates the two other features grouped in this feature, and converts the pair of bin indices for those features into a single bin index, which is returned. Use the add() method to control this conversion.

add(child_bins, bin_index)[source]

Add a single mapping from a pair of child feature bin indices into this feature’s bin index (all indexes start at 0). For example, calling add((1,2), 3) would cause this Cluster feature to return bin index 3 if the child features were in bins 1 and 2 respectively. This method can be called multiple times (even for the same bin_index) to add additional mappings from child bin indices to bin index. If no mapping from a given pair of child indices is present, the undefined bin index is returned.