The mdt.features
Python module¶
MDT features.
Copyright 1989-2020 Andrej Sali.
MDT is free software: you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with MDT. If not, see <http://www.gnu.org/licenses/>.
Protein features¶
These features yield a single value for each protein in the alignment. Each feature takes some common arguments:
- mlib: the
mdt.Library
to create the feature in.- bins: list of bins (see Specification of bins).
- protein: the protein index on which to evaluate the feature from each group of proteins (individual protein, pairs, triples) selected from the alignment (0 for the first, 1 for the second, 2 for the third). See
mdt.Table.add_alignment()
for more details.
-
class
mdt.features.
XRayResolution
(mlib, bins, protein=0, nmr=0.45)[source]¶ Protein X-ray resolution in angstroms. Proteins with a resolution of -1.00 (generally NMR structures) are actually reported as having a resolution of nmr. This decreases the number of bins required to hold all defined resolutions while still separating NMR from X-ray structures.
-
class
mdt.features.
RadiusOfGyration
(mlib, bins, protein=0)[source]¶ Protein radius of gyration in angstroms. The calculation of the center of mass used for this feature is not mass weighted.
-
class
mdt.features.
SequenceLength
(mlib, bins, protein=0)[source]¶ Protein sequence length (number of residues).
-
class
mdt.features.
HydrogenBondSatisfaction
(mlib, bins, protein=0)[source]¶ Hydrogen bond satisfaction index for a protein. This is the average difference, over all atoms in the protein, between the HydrogenBondDonor value and the atom’s donor valency plus the same for the acceptor, as defined in the hydrogen bond file (see
mdt.Library.hbond_classes
).
-
class
mdt.features.
AlphaContent
(mlib, bins, protein=0)[source]¶ Alpha content of the protein. This is simply the fraction, between 0 and 1, of residues in the first mainchain conformation class (see
MainchainConformation
).
Protein pair features¶
These features yield a single value for each pair of proteins in the alignment. Each feature takes some common arguments:
- mlib: the
mdt.Library
to create the feature in.- bins: list of bins (see Specification of bins).
- protein1 and protein2: the indexes of proteins in each group of proteins selected from the alignment to evaluate the feature on; each can range from 0 to 2 inclusive. See
mdt.Table.add_alignment()
for more details.
Residue features¶
These features yield a single value for each residue in each sequence in the alignment. Each feature takes some common arguments:
- delta: if non-zero, don’t calculate the feature for the residue position returned by the residue scan - instead, offset it by delta residues in the sequence. Applied before align_delta.
- align_delta: if non-zero, don’t calculate the feature for the alignment position returned by the residue scan - instead, offset it by align_delta alignment positions. Applied after delta.
- pos2: if True, force a residue pair scan, and evaluate the feature on the second residue in each pair.
- mlib, bins, protein: see Protein features. Note that some residue features do not use the bins argument, because they have a fixed number of bins.
-
class
mdt.features.
ResidueType
(mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]¶ Residue type (20 standard amino acids, gap, undefined).
-
class
mdt.features.
ResidueAccessibility
(mlib, bins, protein=0, delta=0, align_delta=0, pos2=False)[source]¶ Residue solvent accessibility. This is derived from the atomic solvent accessibility; see
AtomAccessibility
.
-
class
mdt.features.
Chi1Dihedral
(self, mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]¶ -
class
mdt.features.
Chi2Dihedral
¶ -
class
mdt.features.
Chi3Dihedral
¶ -
class
mdt.features.
Chi4Dihedral
¶ -
class
mdt.features.
PhiDihedral
¶ -
class
mdt.features.
PsiDihedral
¶ -
class
mdt.features.
OmegaDihedral
¶ -
class
mdt.features.
AlphaDihedral
¶ Residue dihedral angle, from -180 to 180 degrees.
-
class
mdt.features.
Chi1Class
(self, mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]¶ -
class
mdt.features.
Chi2Class
¶ -
class
mdt.features.
Chi3Class
¶ -
class
mdt.features.
Chi4Class
¶ -
class
mdt.features.
Chi5Class
¶ -
class
mdt.features.
PhiClass
¶ -
class
mdt.features.
PsiClass
¶ -
class
mdt.features.
OmegaClass
¶ Residue dihedral class. These classes are defined by MODELLER to group common regions of dihedral space for each residue type.
-
class
mdt.features.
MainchainConformation
(mlib, protein=0, delta=0, align_delta=0, pos2=False)[source]¶ Residue mainchain conformation (Ramachandran) class. This is a classification of the residue’s phi/psi angles into classes as defined in Modeller’s modlib/af_mnchdef.lib file and described in Sali and Blundell, JMB (1993) 234, p785. The default classes are A (right-handed alpha-helix), P (poly-proline conformation), B (idealized beta-strand), L (left-handed alpha-helix), and E (extended conformation).
Residue pair features¶
These features yield a single value for each pair of residues in each sequence in the alignment. See Protein features for a description of the common arguments.
-
class
mdt.features.
ResidueDistance
(mlib, bins, protein=0)[source]¶ Distance between a pair of residues. This is defined as the distance between the ‘special’ atoms in each residue. The type of this special atom can be specified by the distance_atoms argument when creating a
mdt.Library
object. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
AverageResidueAccessibility
(mlib, bins, protein=0)[source]¶ Average solvent accessibility of a pair of residues. See
ResidueAccessibility
.
-
class
mdt.features.
ResidueIndexDifference
(mlib, bins, protein=0, absolute=False)[source]¶ Difference in sequence index between a pair of residues. This can either be the simple difference (if absolute is False) in which case the feature is asymmetric, or the absolute value (if absolute is True) which gives a symmetric feature.
Aligned residue features¶
These features yield a single value for residues aligned between two proteins. For each pair of proteins, every alignment position is scanned, and the feature is evaluated for each pair of aligned residues. See Protein pair features for a description of the common arguments.
-
class
mdt.features.
PhiDihedralDifference
(self, mlib, bins, protein1=0, protein2=1)[source]¶ -
class
mdt.features.
PsiDihedralDifference
¶ -
class
mdt.features.
OmegaDihedralDifference
¶ Shortest difference in dihedral angle (in degrees) between a pair of aligned residues.
-
class
mdt.features.
NeighborhoodDifference
(mlib, bins, protein1=0, protein2=1)[source]¶ Residue neighborhood difference. This is the average of the distance scores (from a residue-residue scoring matrix) of all aligned residues where the residue in the first sequence is within a cutoff distance of the scanned residue. (This cutoff is set by the distngh argument to
mdt.Table.add_alignment()
.)
-
class
mdt.features.
GapDistance
(mlib, bins, protein1=0, protein2=1)[source]¶ Distance, in alignment positions, to the nearest gap. Note that positions which are gapped in both sequences are ignored for the purposes of this calculation (a ‘gap’ is defined as a gap in one sequence aligned with a residue in the other).
Aligned residue pair features¶
These features yield a single value for each pair of residues aligned between two proteins. For each pair of proteins, each pair of alignment positions is scanned, and the feature is evaluated for each pair of pairs of aligned residues. See Protein pair features for a description of the common arguments.
-
class
mdt.features.
ResidueDistanceDifference
(mlib, bins, protein1=0, protein2=1)[source]¶ Distance between two residues in the second protein, minus the distance between the equivalent residues in the first protein. See
ResidueDistance
. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
AverageNeighborhoodDifference
(mlib, bins, protein1=0, protein2=1)[source]¶ Average residue neighborhood difference for a pair of alignment positions. See
NeighborhoodDifference
.
-
class
mdt.features.
AverageGapDistance
(mlib, bins, protein1=0, protein2=1)[source]¶ Average distance to a gap from a pair of alignment positions. See
GapDistance
.
Atom features¶
These features yield a single value for each atom in the first protein in each group of proteins selected from the alignment. Each feature takes some common arguments:
- pos2: if True, force an atom pair scan, and evaluate the feature on the second atom in each pair.
- mlib, bins: see Protein features. Note that some atom features do not use the bins argument, because they have a fixed number of bins.
-
class
mdt.features.
AtomAccessibility
(mlib, bins, pos2=False)[source]¶ Atom solvent accessibility. This is calculated by the PSA algorithm, and controlled by the surftyp and accessibility_type arguments to
mdt.Table.add_alignment()
. The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
FractionalAtomAccessibility
(mlib, bins, pos2=False)[source]¶ Fractional atom solvent accessibility, from 0 to 1. This is the atom solvent accessibility (see
AtomAccessibility
) divided by the volume of the atom, derived from its van der Waals radius. The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
AtomType
(mlib, pos2=False)[source]¶ Type of an atom, as classified by the atom class file. See
mdt.Library.atom_classes
.
-
class
mdt.features.
HydrogenBondDonor
(mlib, bins, pos2=False)[source]¶ Number of hydrogen bond donors. It is defined as the sum, over all atoms within hbond_cutoff (see
mdt.Library
) of the atom, of their donor valencies as defined in the hydrogen bond file (seemdt.Library.hbond_classes
). The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
HydrogenBondAcceptor
(mlib, bins, pos2=False)[source]¶ Number of hydrogen bond acceptors. It is defined as the sum, over all atoms within hbond_cutoff (see
mdt.Library
) of the atom, of their acceptor valencies as defined in the hydrogen bond file (seemdt.Library.hbond_classes
). The feature is considered undefined if the atom’s Cartesian coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
HydrogenBondCharge
(mlib, bins, pos2=False)[source]¶ Hydrogen bond charge. It is defined as the sum, over all atoms within hbond_cutoff (see
mdt.Library
) of the atom, of their charges as defined in the hydrogen bond file (seemdt.Library.hbond_classes
).
-
class
mdt.features.
AtomTable
(mlib, bins, table_name, func, pos2=False)[source]¶ A tabulated atom feature. The feature is simply a table of N floating-point numbers, where N is the number of atoms in the system. This table is provided by a Python function, so can be used to implement user-defined features or to pass in features from other software. A simple example to use the x coordinate as a feature:
def func(aln, struc, mlib, libs): return [a.x for a in struc.atoms] f = mdt.features.AtomTable(mlib, bins, "x coordinate", func)
Atom pair features¶
These features yield a single value for each pair of atoms in the first protein in each group of proteins selected from the alignment. See Protein features for a description of the common arguments.
-
class
mdt.features.
AtomDistance
(mlib, bins)[source]¶ Distance in angstroms between a pair of atoms. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
AtomBondSeparation
(mlib, bins, disulfide=False)[source]¶ Number of bonds between a pair of atoms. For example, two atoms that are directly bonded return ‘1’, while two at opposite ends of an angle return ‘2’. The bonds between atoms in each standard amino acid are derived from the bond class file, so this must be read in first (see
mdt.Library.bond_classes
). For atoms in different residues, the residues are assumed to be linked by a peptide backbone, and the number of bonds is calculated accordingly. Atoms in different chains, or atoms of types not referenced in the bond class file, are not connected. If disulfide is set to True, disulfide bridges are also considered (if two residues have SG atoms within 2.5 angstroms, they are counted as bonded). If disulfide is set to False (the default) any disulfide bridges are ignored. Either way, no account is taken of patches and other modifications such as terminal oxygens (unless bonds to OXT are explicitly listed in the bond class file). If a pair of atoms is not connected it is placed in the ‘undefined’ bin.
Tuple features¶
These features yield a single value for each tuple of atoms in the first
protein in each group of proteins selected from the alignment. (The set of
tuples must first be read into the mdt.Library
.)
Each feature takes some common arguments:
- mlib: the
mdt.Library
to create the feature in.- pos2: if True, force a tuple pair scan, and evaluate the feature on the second tuple in each pair.
-
class
mdt.features.
TupleType
(mlib, pos2=False)[source]¶ Type of an atom tuple, as classified by the tuple class file. See
mdt.Library.tuple_classes
.
Tuple pair features¶
These features yield a single value for each pair of tuples of atoms in the
first protein in each group of proteins selected from the alignment. (The set of
tuples must first be read into the mdt.Library
.)
See Protein features for a description of the common arguments.
-
class
mdt.features.
TupleType
(mlib, pos2=False)[source] Type of an atom tuple, as classified by the tuple class file. See
mdt.Library.tuple_classes
.
-
class
mdt.features.
TupleDistance
(mlib, bins)[source]¶ Distance in angstroms between the first atom in each of two tuples The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
TupleAngle1
(mlib, bins)[source]¶ Angle (0-180) between the first atom in the first tuple, the first atom in the second tuple, and the second atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
TupleAngle2
(mlib, bins)[source]¶ Angle (0-180) between the second atom in the first tuple, the first atom in the first tuple, and the first atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
TupleDihedral1
(mlib, bins)[source]¶ Dihedral (-180-180) between the second atom in the first tuple, the first atom in the first tuple, the first atom in the second tuple, and the second atom in the second tuple. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
TupleDihedral2
(mlib, bins)[source]¶ Dihedral (-180-180) between the third atom in the first tuple, the second atom in the first tuple, the first atom in the first tuple, and the first atom in the second tuple. Only works for atom triplets. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
TupleDihedral3
(mlib, bins)[source]¶ Dihedral (-180-180) between the first atom in the first tuple, the first atom in the second tuple, the second atom in the second tuple, and the third atom in the second tuple. Only works for atom triplets. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
Chemical bond features¶
These features yield a single value for each defined chemical bond, angle
or dihedral in the first protein in each group of proteins selected from the
alignment. (The definitions of the chemical connectivity must first be read
from a bond class file; see the bond_clases, angle_classes and
dihedral_classes attributes in mdt.Library
.)
See Protein features for a description of the common arguments.
-
class
mdt.features.
BondType
(mlib)[source]¶ Type of a bond, as classified by the bond class file. See
mdt.Library.bond_classes
.
-
class
mdt.features.
AngleType
(mlib)[source]¶ Type of an angle, as classified by the angle class file. See
mdt.Library.angle_classes
.
-
class
mdt.features.
DihedralType
(mlib)[source]¶ Type of a dihedral, as classified by the dihedral class file. See
mdt.Library.dihedral_classes
.
-
class
mdt.features.
BondLength
(mlib, bins)[source]¶ Length of a bond in angstroms. See
mdt.Library.bond_classes
. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
Angle
(mlib, bins)[source]¶ Angle (0-180). See
mdt.Library.angle_classes
. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
-
class
mdt.features.
Dihedral
(mlib, bins)[source]¶ Dihedral angle (-180-180). See
mdt.Library.dihedral_classes
. The feature is considered undefined if any of the atom coordinates are equal to the Modeller ‘undefined’ value (-999.0).
Group features¶
These features are used to make combinations of other features. Each feature takes some common arguments:
- mlib: the
mdt.Library
to create the feature in.- feat1: an existing feature object that will be included in this group.
- feat2: another existing feature object to include.
- nbins: the number of bins in this feature.
-
class
mdt.features.
Cluster
(mlib, feat1, feat2, nbins)[source]¶ Cluster feature. When evaluated, it evaluates the two other features grouped in this feature, and converts the pair of bin indices for those features into a single bin index, which is returned. Use the
add()
method to control this conversion.-
add
(child_bins, bin_index)[source]¶ Add a single mapping from a pair of child feature bin indices into this feature’s bin index (all indexes start at 0). For example, calling add((1,2), 3) would cause this Cluster feature to return bin index 3 if the child features were in bins 1 and 2 respectively. This method can be called multiple times (even for the same bin_index) to add additional mappings from child bin indices to bin index. If no mapping from a given pair of child indices is present, the undefined bin index is returned.
-