Google

BEAST (CCP4: Supported Program)

NAME

Beast - Likelihood-based molecular replacement (version 1.0)

SYNOPSIS

beast HKLIN foo.mtz
[Keyworded input]

DESCRIPTION

Beast carries out molecular replacement calculations using maximum likelihood targets for the rotation and translation functions. If there are multiple alternative models for a molecule, they can all be supplied to the program, which will construct from them a statistically-weighted set of averaged structure factors. In addition to a moving molecule being used for the search, additional molecules can be fixed in just orientation or in orientation and position.

INPUT FILES

Input

HKLIN
Input MTZ file containing Fobs for the unknown crystal. (Sigma(Fobs) is not yet used.)
PDB
The models are specified by PDB files, named in the keyworded input. If there are multiple models for a particular molecule, all the models must be superimposed in a common orientation but presented to the program in separate PDB files.

KEYWORDED INPUT

Compulsory keywords: LABIn, MOLEcule, MODEl, SEARch

Additional keywords: TRIAl, FIX, RLISt, TLISt, SOLUtion[R/T/F], SOL_[RF/TF], RESOlution, BEST, CLUSter, AVERage, PIVOt

Expert keywords: SOLPar, OUTLier, SHANfac, BOXScale, REPOrt, VERBose

LABIn <program label>=<file label>

Only F need be defined. SIGF is not yet used.

Example:

LABIN F=Fobs

MOLEcule <molecule name> <molecule fraction>

One MOLEcule command should be given for each chemically-different molecular object that will be sought in the search. A molecule can be anything from a single domain of a protein with hinges to a multimeric assembly, as long as it will be adjusted as a rigid body in the molecular replacement search. Multiple copies of the same molecule can be placed in the asymmetric unit through a combination of FIX and SEARCH commands.

The molecule name should be a unique four-character identifier. This is used to associate PDB files with each molecule in the MODEl command, and to specify FIXed and moving (SEARch) molecules.

The molecule fraction is the fraction of the ordered content of the asymmetric unit comprised by each of the models of that molecule. To work this out, you may have to consider the composition of any complexes expected in the crystal and the degree of non-crystallographic symmetry (NCS) expected. This information is used in defining the Sigma(A) curve that calibrates the likelihood functions, so it is important to give the right numbers. If there is uncertainty about the degree of NCS, separate jobs should be run with different assumed molecule fractions.

Examples:

Crystal of a complex of a 40kDa protease and a 10kDa protein inhibitor with room for one complex in the asymmetric unit, looking for the protease with a complete model.

MOLECULE prot 0.8

Same complex, but room for 2 or 3 complexes. Try two runs, with either

MOLECULE prot 0.4

or

MOLECULE prot 0.2667

Same complex, one in a.u., but model of protease comprises only 30kDa. For variety, assume we've placed the protease and are now looking for the inhibitor, which means we'll need two MOLEcule commands and two (sets of) MODEl commands.

MOLECULE prot 0.6
MOLECULE inhi 0.2

MODEl <molecule name> <PDB filename> { IDENt <sequence identity> | RMS <expected rms error> }

One or more MODEl commands should be given for each MOLEcule. The corresponding MOLEcule must have been defined before the MODEl command. If multiple MODEls are given, they must all comprise essentially the same part of the structure, and they must all be placed in a common orientation. Each model must be put in a separate PDB file.

For the Sigma(A) curve used to calibrate the likelihood function, there must be an estimate of the effective RMS error of the model. This can either be given explicitly with the RMS subkeyword, or implicitly with the IDENt subkey. The sequence identity (given either as percent or a fraction) is used to estimate the RMS error from an equation of Chothia & Lesk (1986):

RMS = 0.4A * exp(1.87*(1-IDENT))

For models derived from well-refined crystal structures at medium to high resolution, the IDENT subkey should give a reasonable first estimate of the RMS error. However, tests with NMR-derived models suggest that it is usually better to supply a larger estimate of RMS, with a value of 1.5A having worked in several test cases. For an NMR ensemble, the same RMS error should be given for each member of the ensemble, and it should not be small compared to the RMS deviations among members of the ensemble.

If a molecular replacement search does not yield significantly positive LLG scores, this may mean that your model is not as good as you think it is. In that case it would be worthwhile to increase the estimate of the effective RMS error and repeat the search. Of course, if the molecule comprises too small a fraction of the asymmetric unit or your model is too inaccurate, the signal may be too weak to be found in the noise.

If several possible choices of model are available, it is generally best to use all of them simultaneously. The relative influence of the models, as a function of resolution, will depend on their relative quality, as specified through the RMS or IDENt subkey. Take care, when using the RMS subkey, to assign similar RMS values to similar models. If not, assumptions made in generating an ensemble-average structure factor may be violated!

If you are using an ensemble of NMR structures, Beast is not currently able to interpret MODEL and ENDMDL cards in PDB files, so it is necessary to put the individual models into separate PDB files and give a MODEL command for each. Tests indicate that 10 to 20 models from an ensemble will be sufficient, with relatively little gained in signal-to-noise from adding further models.

Examples:

MODEL prot trypsin.pdb IDENT 0.32
MODEL prot chymotrypsin.pdb IDENT 33
MODEL inhi ovomucoid.pdb RMS 0.6

SEARch <molecule name> [ ROTAte { <arguments...> } [ TRANslate { <arguments...> } ] ]

In a run of Beast, one copy of one molecule can be moved in a search involving rotation and/or translation. (As discussed below, it is also possible to FIX copies of the same or other molecules.) The molecule name must have been defined by a prior MOLEcule command.

If only the molecule name is given, this is taken to specify the moving molecule for subsequent TRIAl, SOLUtionT/F or SOL_TF commands.

Subkeywords:
ROTAte { <angles> | LIST | FULL [<step size>] | REGIon <a1,a2,b1,b2,g1,g2> [<step size>] | AROUnd { <angles> | LIST } <radius> [<step size>] }

The ROTAte subkey has the following options:

<angles>:
Euler angles in degrees (defined by the Crowther z-y-z convention that is also used in the fast rotation function and AMoRe). These angles will be used either for a single rotation evaluation or (more commonly) as a fixed orientation for a TRANslation search. Note that any Euler angles (or translation vectors) used in Beast apply to the molecule in the original orientation given in the model PDB file(s).

LIST:
One or more sets of Euler angles will be given subsequently with the RLISt command.

FULL [<step size>]:
Perform search of all unique orientations. The default step size is set according to the dimensions of the model and the resolution of the data. The geometric mean radial extent of the model is computed, and the rotation step is set to the value that would move atoms at this distance by dmin/2. The search is carried out over a pseudo-hexagonal-close-packed grid in Lattman angles, with the step size expressing the rotational distance to nearest neighbours in the rotation search. Lattman angles are defined as theta+(=alpha+gamma), beta and theta-(=alpha-gamma). These angles are locally orthogonal so they provide a better search space than the original Euler angles.

REGIon <a1,a2,b1,b2,g1,g2> [<step size>]:
As for FULL search, but instead of Beast working out the search limits for the Euler angles from the crystal symmetry, the search is carried out for the Euler angle alpha varying from a1 to a2, beta from b1 to b2, and gamma from g1 to g2. Again, the search is really carried out in terms of Lattman angles on a pseudo-hexagonal-close-packed grid.

AROUnd { <angles> | LIST } <radius> [<step size>]:
Search over neighbouring orientations around (and including) the specified (sets of) Euler angles, within a radius giving the maximal rotational distance in degrees. A single set of Euler angles can be given here; the LIST option indicates that Euler angles will be given with subsequent RLISt commands. The default step size is 1/3 of the search radius. This option should be used to refine orientations from a FULL search prior to carrying out a translation search.

TRANslate { NONE | <vector> | LIST | REGIon <x1,x2,y1,y2,z1,z2> [<step size>] | FRACtional <x1,x2,xstep, y1,y2,ystep, z1,z2,zstep> | AROUnd { <vector> | LIST } <radius> [<step size>] }

The TRANslate subkey is optional. The options are:

NONE:
Same as omitting TRANslate subkey: perform pure rotation search.

<vector>:
Fixed translation vector in fractional coordinates.

LIST:
One or more translation vectors will be given subsequently with the TLISt command.

REGIon <x1,x2,y1,y2,z1,z2> [<step size>]:
Search translation vectors over volume from x1 to x2, y1 to y2 and z1 to z2 in fractional coordinates. The search is carried out on a hexagonal-close-packed grid, with the distance from one point to its nearest neighbour specified by the step size. Use a step size at least as fine as dmin/4. The default step size is dmin/5. If you are searching for the first molecule (i.e. no other molecules with FIXed orientation and position), the search volume should cover the Cheshire cell, i.e. the unique volume relative to any possible choice of origin. For instance, for orthorhombic unit cells, the origin can be shifted by 1/2 in x, y, or z, so the Cheshire cell ranges from 0 to 1/2 in each direction, covering 1/8 of the unit cell volume. (Note, as in this example, that the Cheshire cell is not the same as the asymmetric unit.) As another example, in a monoclinic cell with b unique, the origin can be shifted by 1/2 in x or z but anything in y, so that the Cheshire cell is a plane at y=0, ranging from 0 to 1/2 in x and z. It is important to realise that once you have FIXed the position of one molecule you have defined the origin, so in searching for a second molecule it is necessary to search the whole volume relative to a possible lattice point, i.e. the whole cell for a primitive space group, half the cell (or less) for a space group with a centered lattice.

FRACtional <x1,x2,xstep, y1,y2,ystep, z1,z2,zstep>:
As for REGIon, search translation vectors over volume from x1 to x2, y1 to y2 and z1 to z2 in fractional coordinates, but carry out the search on a grid specified in fractional coordinates with separate fractional step sizes for x, y and z. This option is useful to search over possible origins when putting together partial solutions.

AROUnd { <vector> | LIST } [<radius> [<step size>]]:
Search in a spherical volume around one or more starting vectors (specified in fractional coordinates), with a radius and step size specified in Angstroms. A single vector can be given here; the LIST option indicates that starting vectors will be given with subsequent TLISt commands. Use a step size at least as fine as dmin/4, or finer if you are optimising the solution from a coarser search. The default radius is dmin/2 and the default step size is dmin/5.

Examples:

Carry out rotation search:

SEARCH prot ROTATE FULL

Carry out translation search for specified orientation:

SEARCH prot ROTATE 10. 20. 30. \
            TRANSLATE REGION 0. 0.5   0. 0.5   0. 0.5   0.8

After finding two molecules independently, search over possible choices of origin to place the second molecule relative to the same origin as the first one (which has been previously FIXed). This example applies to space group P2, where possible choices of origin are related by half-unit-cell shifts along x and z and any shift along y.

SEARCH inhi ROTATE 20. 30. 40. \
            TRANSLATE FRACTIONAL 0.22 0.72 0.5   0.0 1.0 0.01   0.44 0.94 0.5

Limited 6D search to refine solution (would be better if a REFINE command were implemented...)

SEARCH prot ROTATE AROUND 10. 20. 30.  3.  0.5 \
            TRANSLATE AROUND 0.11 0.22 0.33   1.  0.25

Evaluate a single possible molecular replacement solution (perhaps obtained from a different program)

SEARCH prot ROTATE 10. 20. 30. TRANSLATE 0.11 0.22 0.33

TRIAl <molecule name> <angles> [ <vector> ]

As an alternative to specifying systematic searches with the ROTAte and TRANslate options of the SEARch command, a list of trial orientations and/or translation vectors can be given. The molecule must have been defined by a MOLEcule command and specified by a SEARch <molecule name> command before the first TRIAl command is given. The molecule name given in the TRIAl command is checked for consistency.

Each TRIAl command adds one trial to a list. If there are fewer than 6 numeric arguments, the first three are interpreted as Euler angles and any others (such as the LLG score from the TRIAL line from output from a previous run) are ignored. If there are six or more numeric arguments, the second set of three are interpreted as a translation vector in fractional coordinates. The TRIAl command is useful for rescoring possible solutions from other programs, or for testing possible solutions from a prior run of Beast, but using new resolution limits or new assumptions about the model quality.

Note that different types of trials (i.e. rotation vs. translation) cannot be mixed in different TRIAl commands.

Examples:

Test two possible orientations.

TRIAL prot 10. 20. 30.
TRIAL prot 20. 30. 40.
Test two possible rotation/translation solutions.

TRIAL prot 10. 20. 30. 0.11 0.22 0.33
TRIAL prot 20. 30. 40. 0.22 0.33 0.44

FIX <molecule name> <angles> [ <vector> ]

Optionally fix a molecule in a known orientation and possibly also position. The orientation is specified in Euler angles and the position as a translation vector in fractional coordinates. The molecule must have been defined before the FIX command is given.

Examples:

Fix orientation; position still unknown.

FIX prot 10. 20. 30.

Fix orientation and position of the protease component from two complexes before looking for the inhibitor components.

FIX prot 10. 20. 30.   0.11 0.22 0.33
FIX prot 20. 30. 40.   0.22 0.33 0.44

RLISt <molecule name> <angles>

Following the LIST option being given in the ROTAte subkey of a SEARch command, add a trial orientation given as Euler angles. The given molecule name is checked for consistency with the SEARch command.

Example:

RLIST prot 10. 20. 30.

TLISt <molecule name> <vector>

Following the LIST option being given in the TRANslate subkey of a SEARch command, add a trial position given as a fractional vector. The given molecule name is checked for consistency with the SEARch command.

Example:

TLIST prot 0.11 0.22 0.33

SOLUtion[R/T/F] <dummy arg> <rotation/[translation]>

Compatibility command, provided to read in intermediate results from AMoRe, in the form of SOLUTIONR (rotation), SOLUTIONT (translation) and SOLUTIONF (fitting) records. For all of these the first argument (model number for AMoRe) is ignored. The molecule name is inferred from a prior SEARCH command.

NB: Make sure that the PDB files are reoriented to the orientation used by AMoRe before using this option to rescore AMoRe results.

SOLUTIONR record: add a trial orientation given as Euler angles in the second, third and fourth arguments. This is an alternative to the RLIST command.

SOLUTIONT or SOLUTIONF record: add a trial rotation/translation solution, given as Euler angles and fractional translation in arguments 2-7. This is an alternative to the TRIAL command.

Example:

Add a set of Euler angles to the list

SOLUTIONRC    1   28.03   49.06   47.17  0.0000  0.0000  0.0000 36.2 60.4 30.9 15.0   1

Add two possible rotation/translation solutions to the list

SOLUTIONF1_1    1   18.82   63.37  225.30  0.3196  0.3388  0.2196 46.2 58.6 35.8   1
SOLUTIONF2_1    1   59.76   30.78  263.49  0.2139  0.1010  0.4568 44.8 58.7 38.2   2

SOL_[RF/TF] <dummy arg> <rotation/[translation]>

Compatibility command, provided to read in intermediate results from Molrep, in the form of SOL_RF (rotation) and SOL_TF (translation) records. For both of these the first argument (solution number for Molrep) is ignored. The molecule name is inferred from a prior SEARCH command.

SOL_RF record: add a trial orientation given as Euler angles in the second, third and fourth arguments. This is an alternative to the RLIST command.

SOL_TF record: add a trial rotation/translation solution, given as Euler angles and fractional translation in arguments 2-7. This is an alternative to the TRIAL command.

Example:

Add a set of Euler angles to the list

Sol_RF  1    85.01   83.84  328.92   63.21  148.05   96.92     4706.      5.05

Add two possible rotation/translation solutions to the list

Sol_TF_1  1   85.01   83.84  -31.08  0.901  0.391  0.115   14.31  0.457  0.339
Sol_TF_1  2   85.01   83.84  -31.08  0.676  0.249  0.943   11.36  0.454  0.341

RESOlution <dmin> [ <dmax> ]

Optionally specify resolution, in Angstroms. By default, the program uses all low resolution data and sets dmin to twice the estimated RMS error of the best model in the SEARch molecule. At a resolution equal to twice the estimated RMS error, Sigma(A) is about 0.2 and dropping rapidly, so that the model predicts the data only poorly and the contribution to the likelihood score is small and noisy. For good models, it is probably sufficient to use data to only 3A resolution, if computing time is a problem.

Examples:

Give high resolution limit only.

RESOLUTION 3.5

Give both limits.

RESOLUTION 3.5 25.0

BEST [ <NBest> [<NBestT> ] ] [ FILE <filename> ]

Optionally change number of best trials to report at end of entire search (NBest) or at end of translation search for each orientation (NBestT). NBest defaults to 20, and must be at least 1. NBestT defaults to NBest, but may be set to zero. If the FILE subkey is given, the NBest results are written to the specified file as either RLIST (rotation search) or TRIAL (search involving translation) commands that can be input to subsequent Beast runs. This is done most conveniently using the standard CCP4 "@filename" command, perhaps after commenting out undesired lines with the "#" symbol. The default is not to save a separate file.

Examples:

In a 6D search, report best 10 trials overall and best 3 for each orientation.

BEST 10 3

Do the same, but save the best 10 trials overall in a file named beast_tf.mr.

BEST 10 3 FILE beast_tf.mr

In a rotation search, save the default number of best trials to a file named beast_rf.mr

BEST FILE beast_rf.mr

CLUSter [ <NClust> ] [ FILE <filename> ]

Optionally change number of best clusters to report at end of entire search. NClust defaults to 20. Each cluster represents a local maximum in the search space. In most cases, the local maximum in an orientation search will give the best orientation for a subsequent translation search. However, if the translation search fails it may be useful to search on a set of orientations giving the best scores, regardless of whether or not they represent local maxima. If the FILE subkey is given, the NClust results are written to the specified file as either RLIST (rotation search) or TRIAL (search involving translation) commands that can be input to subsequent Beast runs. This is done most conveniently using the standard CCP4 "@filename" command, perhaps after commenting out undesired lines with the "#" symbol. The default is not to save a separate file.

Example:

CLUSTER 25 FILE beast_rf_clust.mr

AVERage { STATistical | PRIOr }

Optionally change method by which ensemble average structure factors are computed from multiple models of the same molecule. The default is to use a statistical average (STATistical), in which the relative weights for the different models come from the covariance matrix. However, this can be numerically unstable if several of the models are very similar to each other, and is a particular problem when using ensembles of NMR models. In this case, it is better to fix the relative weights of the models to be proportional to prior values given by the Sigma(A) values computed from the four-parameter functional form (PRIOr).

Examples:

Use statistically-weighted ensemble average for a small number of reasonably different models. (This is the default, so the command need not be given.)

AVERAGE STAT

Use prior relative weights (from Sigma(A) values).

AVERAGE PRIO

PIVOt { CENTre | <vector> }

For TRANslate AROUnd option of SEARCH, optionally change the pivot point for the rotation of the moving molecule. This only comes into play when orientations are changed through the AROUnd option of the ROTAte subkey, and then only when just a single orientation is specified. The default is to pivot around the centre of mass of the moving molecule (CENTre). If a vector is given, this specifies the pivot point in orthogonal Angstroms in terms of the input PDB file for the moving molecule. The effect is that, at the centre of the translation search for each orientation, the pivot point is placed in the same position. This position is defined by applying the starting orientation and starting translation.

If it is desired to fix a pivot point for more exhaustive searches (for which the PIVOt command would not apply), this can be achieved by translating the model to place the desired pivot point at (0,0,0) in the input PDB file.

Example:

Pivot around a presumed hinge point when searching for the second domain of a two domain molecule after positioning the first domain. In the original coordinate set, the hinge point is at (12.,25,-13.), and the SEARch is being carried out around the orientation and position found for the first domain.

SEARCH dom2 ROTATE AROUND 10. 20. 30. 10. 3. \
            TRANSLATE AROUND 0.11 0.22 0.33 3. 0.5
PIVOT 12. 25. -13.

SOLPar <fsol> <Bsol>

Optionally change solvent parameters for Sigma(A) curves from the default values of fsol=0.95, Bsol=150. The arguments fsol and Bsol can be given in either order, with the larger value assumed to be Bsol. The results are not terribly sensitive to these parameters, which affect only lower resolution data, but optimal values will be the subject of further study.

Example:

SOLPAR 0.9 200.

OUTLier <probability threshold>

Optionally change the outlier rejection criterion from its default value of 1 chance in a million of seeing a reflection this large. The argument for the OUTLier command can be given as the probability or (for convenience) its inverse. Only values in the range 0 < p <= 0.001 are allowed. The theory of outlier rejection is described in a paper published in the 1999 CCP4 proceedings.

Examples:

The following commands will all have the effect of setting the probability limit to 1 in 100,000.

OUTLIER 0.00001

OUTLIER 1.E-5

OUTLIER 100000

SHANfac <density oversampling factor>

Optionally change the Shannon density oversampling factor from its default of 1.5. This is used to choose the grid spacing of electron density in the FFT structure factor calculations. With a ShanFac of 1.0 (the theoretical minimum), the density would be sampled on a grid of dmin/2. The default should be fine, but could be reduced if memory were insufficient for a large problem. ShanFac must be at least 1.1, and it is recommended to use at least 1.25.

Example:

SHANFAC 1.25

BOXScale <molecular transform oversampling factor>

Optionally change the molecular transform oversampling factor from its default of 2.0. This is used to choose the size of the cell from which the finely-sampled molecular transform is computed for each molecule. With a BoxScale of 1.0 (the theoretical minimum for a perfect molecular transform interpolation scheme), each cell edge would be twice the size of the molecule in that direction. Note that because linear interpolation is used, the oversampling should be considerably greater than the theoretical minimum.

The default should be fine, but could be reduced if memory were insufficient for a large problem. However, to reduce memory usage it is better to reduce ShanFac first and only reduce BoxScale if necessary. BoxScale must be at least 1.2, and it is recommended to use at least 1.5.

Example:

BOXSCALE 1.5

REPOrt { ALL | BEST }

Optionally change level of reporting of trial scores. The default is to report only the best scores (BEST) as they are encountered and then in a sorted list at the end. If desired, all trials can be reported as scored (ALL). This generates excessive output, particularly for translation searches, but may be desirable to allow detailed statistical analysis of the data.

Example:

REPORT ALL

VERBose

Turn on extra output, most of which is there for debugging purposes and will not be of interest to most users.

PRINTER OUTPUT

The most interesting part of the output from Beast is the log-likelihood-gain scores for different trial orientations and/or positions, which are given in the form:

TRIAL <molecule name> <alpha> <beta> <gamma> <LLG score>

for rotation searches, and

TRIAL <molecule name> <alpha> <beta> <gamma> <x-frac> <y-frac> <z-frac> <LLG score>

for (rotation and) translation searches.

At the end of each translation search for a particular orientation, the NBestT translations are optionally printed in a sorted list, headed by "BESTT", and at the end of the entire search the NBest trials are printed in a sorted list, headed by "BEST". Finally, the NClust best cluster leaders are printed, headed by "CLUST".

At the end of the output, Beast repeats the best score found, the mean and rms deviation, and the corresponding signal-to-noise (score-mean/rms), which can be useful in judging significance.

The CCP4 program pdbset can be used to apply the rotation and translation from Beast to the input PDB file, as shown below. (Note that the CELL command is not necessary if your cell is given in a CRYST1 card in the input PDB file.)

pdbset XYZIN input.pdb XYZOUT output.pdb
CELL 80. 90. 100. 90. 90. 90.
ROTATE EULER 10. 20. 30.
SHIFT FRACTIONAL 0.11 0.22 0.33
END

PROGRAM FUNCTION

Details of the algorithms are presented in a paper published in the CCP4 Proceedings from 2001. That should be consulted by anyone who wants to understand in detail what is happening in Beast.

Very briefly, Beast scores possible molecular replacement solutions by likelihood. There are two types of likelihood score, depending on whether all the structure factor contributions from different molecules have known relative phases or not. If the structure factor contributions do all have known relative phases (e.g. translation search in which FIXed molecules, if any, have been fixed in both orientation and position), then the likelihood function is the same as the one used in maximum likelihood refinement, i.e. the probability distribution of the true structure factor given a calculated structure factor.

However, if the relative phases of the contributions of different molecules are not known, then there is not a single calculated structure factor and a different likelihood function must be used. This likelihood function is the probability that any particular value of the true structure factor could be obtained by adding up different contributions with unknown relative phases. This is a random walk problem similar to the Wilson distribution.

The most common circumstance in which there are unknown relative phases is a rotation search, in which for any orientation the amplitude of the contributions of symmetry-related molecules is known but not their relative phases. For this reason, the likelihood function for unknown relative phases is referred to, more simply, as the rotation likelihood function. However, there are other circumstances in which relative phases are unknown, for instance, when the orientation of one molecule is known while a translation search for the second molecule is being carried out.

Beast chooses the appropriate likelihood function automatically. The user only needs to understand that the additional uncertainty of the rotation likelihood function increases its noise level compared to the translation likelihood function. This means that, for difficult problems, the correct answer may only emerge after carrying out translation searches for a number of the top rotation solutions.

To put the likelihood scores on an absolute scale, they are expressed in terms of log-likelihood-gain (LLG). This is defined as the difference between the log-likelihood of the model and the log-likelihood from the Wilson distribution, i.e. how much better does the oriented and/or translated model predict the data than the Wilson distribution. If the best LLG score in a search is negative, this means that you are being too optimistic about the quality of the model (completeness or RMS error). For a true solution, the LLG should be significantly greater than zero.

As a hypothesis becomes more specific, the probabilities become sharper and the log-likelihood score for a correct solution should increase. In terms of molecular replacement, this happens when the position is added to the orientation, or if more molecules are placed in the unit cell. At every step, the LLG score should increase.

There are relatively few adjustable parameters in Beast. The most important ones are those that tell Beast about the quality and completeness of the model(s): the molecule fraction in the MOLEcule command and the RMS error in the MODEl command. These are used to compute variances in the likelihood functions and thus have a big impact on the quality of the results. If you are over-optimistic about the quality of your model, the probability distributions will be too sharp and the correct answer may get a low score because each observation misses the peak in its assumed distribution. It is probably better to be a bit pessimistic, because the score then will be lowered slightly but not missed completely.

Beast calibrates the likelihood functions by computing a Sigma(A) curve for each model, using a four-parameter functional form. Apart from the quality (RMS) and completeness (fp) parameters, there are two low-resolution terms to account for the lack of a disordered solvent model, which can be adjusted with the SOLPar command. The equation is the following:

Sigma(A) = SQRT{fp * [1 - fsol * exp(-Bsol*(sin(theta)/lambda)**2) ]} *
           exp{-(8*Pi**2/3) * RMS**2 * (sin(theta)/lambda)**2 }

Note that Beast is quite slow, because it computes the likelihood functions reflection by reflection for each orientation or position. Typical run times are measured in hours.

EXAMPLES

Simple rotation function, default step size.

beast HKLIN my.mtz << eof-beast > beast.log
LABIN F=Fobs
MOLECULE mol1 1.0
MODEL mol1 model1.pdb IDENT 0.65
SEARCH mol1 ROTATE FULL
END
eof-beast

Simple translation function, orthorhombic cell, 0.6A step.

beast HKLIN my.mtz << eof-beast > beast.log
LABIN F=Fobs
MOLECULE mol1 1.0
MODEL mol1 model1.pdb IDENT 0.65
SEARCH mol1 ROTATE 10. 20. 30. \
            TRANSLATE REGION 0. 0.5   0. 0.5   0. 0.5   0.6
END
eof-beast

Similar translation search, but trying two possible orientations.

beast HKLIN my.mtz << eof-beast > beast.log
LABIN F=Fobs
MOLECULE mol1 1.0
MODEL mol1 model1.pdb IDENT 0.65
SEARCH mol1 ROTATE LIST \
            TRANSLATE REGION 0. 0.5   0. 0.5   0. 0.5   0.6
RLIST mol1 10. 20. 30.
RLIST mol1 20. 30. 40.
END
eof-beast

Rotation search for second molecule after finding first one. First molecule has two choices of model.

beast HKLIN my.mtz << eof-beast > beast.log
LABIN F=Fobs
MOLECULE prot 0.8
MODEL prot trypsin.pdb IDENT 0.32
MODEL prot chymotrypsin.pdb IDENT 0.33
MOLECULE inhi 0.2
MODEL inhi ovomucoid.pdb IDENT 1.0
FIX prot 10. 20. 30.  0.11 0.22 0.33
SEARCH inhi ROTATE FULL
END
eof-beast

Rescore possible solutions from another program or a previous run of Beast.

beast HKLIN my.mtz << eof-beast > beast.log
LABIN F=Fobs
MOLECULE mol1 1.0
MODEL mol1 model1.pdb IDENT 0.65
SEARCH mol1
TRIAL mol1 10. 20. 30. 0.11 0.22 0.33
TRIAL mol1 20. 30. 40. 0.22 0.33 0.44
TRIAL mol1 30. 40. 50. 0.33 0.44 0.55
END
eof-beast

Unix examples script found in $CEXAM/unix/runnable/

  • beast_6Dref.exam
    Refine orientation and translation.
  • beast_rotate.exam
    Rotation search (full) for toxd using an ensemble of two models.
  • beast_rref.exam
    Refine orientations.
  • beast_translate.exam
    Translation search after orientation refinement.
  • beast_rescore.exam
    Rescore rotation function results from external program

    Compiling Separately

    These instructions are for compiling beast separately from the CCP4 installation.

    Beast requires that you have installed CCP4 version 4.1 and set the environment variables. If you only have version 4.0 installed, you will need to get the new version of rwbrook.f and compile that in. It also requires the LAPACK and BLAS libraries, which are installed under CCP4 4.1 when the --with-lapack option is given in the configure step. LAPACK is also available from www.netlib.org, if it is not already installed on your system and you don't wish to reconfigure your CCP4 installation. Beast should compile under Linux, Irix and Compaq Alpha systems, using one of the supplied Makefiles. Perhaps the biggest complications will be with Irix, where different choices are sometimes made with the compiler flags, e.g. -o32 vs. -n32 or -64. If it doesn't compile, you may have to look at the flags used in the CCP4 Makefiles. It hasn't been tested under other systems, but a variation on one of the Makefiles would probably work for most. To choose your operating system, do one of the following:

    cp Makefile.<system> Makefile
    make
    

    or

    make -f Makefile.<system>
    

    Beast has been tested most extensively under RedHat Linux 7.1 and, if that's your operating system, it should be fine. Under other systems, it's probably a good idea to run one or more of the test jobs in the test directory supplied with the program, and compare your output with the sample output provided. If you find bugs under any operating system, please let me know the details so I can fix them!

    On multi-processor machines with compilers that support OpenMP parallelisation directives, the translation search can be sped up by compiling with the OpenMP flags and then using more than one processor. Depending on the setup of the system and the number of users sharing it, you may wish to limit the number of processors that can be grabbed by Beast at any one time. This is done by setting the environment variable OMP_NUM_THREADS before running Beast. For instance, the following command would limit Beast to four processors:

    setenv OMP_NUM_THREADS 4
    

    REFERENCES

    1. R.J. Read (2001), "Pushing the boundaries of molecular replacement with maximum likelihood." Acta Cryst. D57: 1373-1382.
      Click here to download.
    2. R.J. Read (1999), "Detecting outliers in non-redundant diffraction data." Acta Cryst. D55: 1759-1764.
    3. C. Chothia & A.M. Lesk (1986), "The relation between the divergence of sequence and structure in proteins." EMBO J. 5: 823-826.

    Acknowledgments

    Kay Diederichs provided several code optimisations including parallelisation to speed up the searches, particularly for translations, as well as the code to sort the best results. Airlie McCoy and Raj Pannu provided many good suggestions for the program function and numerical analysis. Anne Baker wrote an interface for the CCP4 GUI that will save most users from reading this documentation, and provided good suggestions for new features.

    AUTHOR

    Written by Randy J. Read, University of Cambridge
    e-mail:rjr27@cam.ac.uk