The DRAGON User Guide
Version 4.18.1
May 2000

András Aszódi
Present address:
Novartis Forschungsinstitut GmbH
Brunnerstrasse 59
A-1235 Vienna, Austria

Contents

  • Introduction
  • Preparing for a Simulation
  • Running a Simulation
  • After the Simulation
  • Command reference
  • Parameters
  • References

  • Introduction

    DRAGON is a protein modelling tool using Distance Geometry. It was developed at the Division of Mathematical Biology of the National Institute for Medical Research (NIMR) between 1993 and 1996: the algorithms were designed by Willie R. Taylor and myself, while I am solely responsible for the implementation. DRAGON attempts to predict the tertiary structure of a small soluble protein, given its sequence, the secondary structure and possibly a set of structure-specific restraints. DRAGON communicates with you through a simple command-line interface which is used to specify parameter values and input filenames. During the run, DRAGON keeps you informed by writing a lot to the standard output. Finally, the model structures are written to coordinate files in PDB format.
    Distance restraints are obtained from external sources such as NMR experiments and from estimates based on the conserved hydrophobicity of residues. Background stereochemical knowledge is also included to model the hydrophobic effect, the geometry and handedness of secondary structures and the absence of native tangled conformations.

    Why DRAGON? The acronym allegedly stands for "Distance Regularisation Algorithm for Geometry OptimisatioN'', but in fact any other silly acronym would have done, the main purpose of its silliness being that people would remember it.

    Typographical Conventions

    Anything that appears in fixed-width typewriter font is either a command, variable name etc. that must be typed literally or program output. Emphasised words like filename describe general concepts which can take different values depending on context etc. Items enclosed between square brackets [ ] are optional.

    Installation

    Supported platforms

    Currently the following platforms and operating systems are supported:- Please note that there is no support provided or even planned for the operating systems marketed by Microsoft.

    Executable installation

    This distribution contains the executables in the subdirectories of dragon4/bin corresponding to each architecture, and the data (dragon4/data) and documentation (dragon4/doc) subtrees. A C shell script dragon4/bin/startup-<ARCH>.sh is provided that sets the following environment variables:- The startup script also adds the directories $DRAGON_ROOT/bin/$ABI $PVM_ROOT/lib/$PVM_ARCH $PVM_ROOT/bin/$PVM_ARCH to your path.

    Silicon Graphics

    Since the introduction of IRIX 6.2, SGI provides three different application binary interfaces (ABIs):-
    1. irix-o32: this is the "old" 32-bit ABI which runs on every SGI machine under IRIX 5.3 and above. The o32 version of DRAGON was compiled with the -mips2 flag and consequently will not run on R3000 processors. Use this ABI if you have not upgraded from IRIX 5.3 yet. Support for this ABI will be discontinued soon.
    2. irix-n32: this is the "new" 32-bit ABI which is allegedly faster than the o32 version (your mileage may vary). From my point of view the main advantage of this ABI was that the n32 C++ compiler supported more language features than the old o32 version. The n32 executable of DRAGON was compiled with the -mips3 flag and runs on R4000 processors and above under IRIX 6.2 and above. This is the preferred ABI as far as compatibility is concerned.
    3. irix-n64: this is the 64-bit version of DRAGON compiled with the -mips4 flag that runs on R8000 and R10000 processors. The code was optimised for the R10000 processor. Please note that the n64 ABI is NOT supported on R10000 O2 machines running IRIX 6.3.
    There is no support for the Intel-based SGI workstations running Windows NT.

    PC/Intel X86/Linux

    The Linux executable (ELF format) was compiled with GCC/G++ 2.95.2 running under Linux 2.2.x. You may get dire warnings in some cases where the bugs in G++ cause problems (in particular, the program cannot catch and process SIGINT interrupts (Ctrl-C) properly). OpenGL graphics is supported via Brian Paul's Mesa library. It is recommended to use an accelerated SVGA X server with at least 16 bit colour to get the best graphics results.
    Sun/Solaris

    The Sun executables were compiled under Solaris 2.X using the Gnu GCC/G++ compiler (version 2.8.1). Please note that OpenGL graphics is not supported by the Sun version. Executables are provided for SPARC and X86 architectures.

    Portability

    Apart from the OpenGL graphics option, the C source is pure ANSI, while the C++ source corresponds to the C++ Annotated Reference Manual. It is straightforward to port DRAGON to other architectures supported by GCC. The main obstacle to porting the code with other native compilers is the C++ template instantiation which would require some tinkering with the Makefiles.

    Related software

    There are auxiliary programs which can be used in conjunction with DRAGON called rank, hbassign, sidech, clumsy, secmap but they are not absolutely necessary for using DRAGON itself. Some manual pages are provided for these programs and they also print a few help lines if you invoke them without any command-line arguments.

    DRAGON can run under PVM, a simulated parallel-processing environment. You can obtain PVM from Netlib but DRAGON will run without it anyway if you don't want to install it.

    Licensing

    The executables of DRAGON can be used by anyone, no licence is needed. However, please keep in mind that DRAGON is the intellectual property of William R. Taylor and András Aszódi and as such both of us would be grateful if you cited our relevant papers when publishing your work in which DRAGON was used. We do not distribute the source code.

    Disclaimer...

    ...without which it is impossible to write a good User Guide nowadays. Let's see:-
    Copyright © 1993-2000. András Aszódi, William R. Taylor.
    The DRAGON program suite is distributed free of charge. The copyright holders therefore undertake no warranty of any kind, as detailed the NO WARRANTY section below which was taken from the GNU General Public License (Copyright © 1989, 1991 Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA):-

    NO WARRANTY

    BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

    IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    Special Novartis Disclaimer

    Please note that the DRAGON package has nothing to do with and is not endorsed in any way by my present employer, the Novartis Forschungsinstitut GmbH, Vienna, Austria.

    Support

    There is none. The development of DRAGON is finished, I am not working for the NIMR any more and I have no time for active maintenance of the software. Sporadic updates may occur from time to time though. Bug reports are welcome but I will not reply to them. Also, I will not provide tutorials: when in doubt, please consult this Guide.

    Acknowledgements

    I would like to express my gratitude to my former boss, Willie Taylor, for those exciting four years at the NIMR during which DRAGON was designed and implemented. Special thanks are due to Robin Munro, the first user of DRAGON, whose help in testing the software proved invaluable. He also supported the good cause by setting up the DRAGON homepage and the anonymous FTP site. Nigel Douglas wrote the Sun ports and helped isolating some annoying bugs. Jan Hungershöfer added support for running multiple slave processes on multiprocessor PVM nodes.

    Preparing for a Simulation

    Prior to a run, you have to make some preparations. First of all, DRAGON will need the sequence of your protein. Although a single sequence might be sufficient under extremely lucky circumstances, there is so much information to be gained from a multiple alignment that DRAGON expects you to supply one. It is also necessary to prepare a secondary structure assignment file for your target protein. This is the bare minimum. However, the program will appreciate if you also supply distance restraints which you might have obtained from your friends in the NMR lab, or from other experimental sources. If you know for sure that a given residue is on the surface of the molecule then you can also prepare an accessibility file. Best of all, if the structures of some of the homologous sequences in the multiple alignment is known, then you can attempt homology modelling by giving DRAGON the corresponding structures in PDB format. The names of all these files have to be specified in the parameter file, together with other options and numerical parameter values.

    The multiple alignment

    Using your favourite database and search method, collect sequences which are homologous to your target. It is usually a good idea to incorporate distant homologs into the multiple alignment since a bunch of almost identical sequences would not provide any extra information. The main aim of the alignment is to pinpoint truly conserved residues: DRAGON relies very heavily on conserved features. Since the program's speed will not be influenced by the number of sequences in the alignment, try to collect as many sequences as possible.

    Once you selected the sequences, perform the alignment. Unfortunately DRAGON will not do this for you; you have to use a separate sequence alignment program. With due respect, do not take the results for granted, no matter how famous the author of the alignment program might be: silly things can (and will) happen. The best is to use an alignment program with a good graphical interface that lets you after-edit the raw results. Experiment with various similarity matrices and gap penalties. If in doubt, prepare a few different alignments.

    Several decades of research into biological sequences provided mankind with a plethora of sequence alignment formats. DRAGON can read a relaxed version of the GCG format (MSF), a PIR alignment format or a horrible vertical format, which is similar to the output of MULTAL. I regret that currently no other formats are acceptable.

    Secondary structure assignment

    DRAGON cannot predict secondary structure at the moment; it used to have a module for that purpose but that did not really work so until we come up with something truly snazzy, you'll have to do the assignment yourself. NMR experiments could usually provide very good secondary structure assignments. If you have no access to NMR data, then the secondary structure must be predicted in some way. The auxiliary program secmap is provided to map secondary structure information from scaffold structures onto the target sequence using the multiple alignment if you do homology modelling.

    Using the secmap program

    This is how you invoke the program:-

    secmap alignment_file target_no DSSP_file [ DSSP_file ...]

    where alignment_file is the same alignment you plan to use for modelling, target_no is the number of the target sequence in the alignment, the DSSP_files (of which at least one is mandatory) contain the secondary structure assignments for the scaffold structures as generated by the DSSP program of Kabsch and Sander (not supplied). secmap then works out the secondary structure assignment for each residue in the target sequence and prints an output like this:-

    # >> Align_: Trying MULTAL format...
    # >> Align_: MULTAL parsing successful, seqno=8
    # Alignment file: lact/la.aln
    # Target sequence number = 2
    # Scaffold: "lact/1ALC.dssp", sequence number = 1
    # Scaffold: "lact/1HEL.dssp", sequence number = 5
    - [---] - - -
    ...
    R [ 1]
    I [ 2]
    ...
    I [ 5] h h h
    ...
    L [ 12] h h
    K [ 13] non-cons! 3 h
    ...
    L [ 43] 52aB---- 52aB---- 52aC----
    N [ 44] 51aB---- 51aB---- 51aC----
    Y [ 45] 50aB---- 50aB---- 50aC----
    Y [ 46]
    - [---] - - -
    - [---] - -
    N [ 47]
    G [ 48]
    S [ 49]
    S [ 50] 45aB---- 45aB---- 45aC----
    S [ 51] 44aB 58a 44aB 58a 44aC 58a
    H [ 52] 43aB 57a 43aB 57a 43aC 57a
    G [ 53]
    L [ 54]
    F [ 55]
    Q [ 56]
    I [ 57] 52aB---- 52aB---- 52aC----
    N [ 58] 51aB---- 51aB---- 51aC----
    Q [ 59]
    ...

    The first two columns list the amino acids in the target sequence, with the positions in square brackets. The character '-' indicates that the target is gapped at that position. The next column is the mapped secondary structure assignment, the following columns contain the secondary structure assignments from the given scaffold DSSP files. The characters '3', 'h', 'p' in the middle of the assignment columns indicate 3/10-, alpha- and pi-helices, respectively, any other capital letter stands for a beta-sheet ID (as defined in the DSSP files). For beta assignments, the numbers before and after the sheet ID show the partner amino acids in the neighbouring strands, 'a' and 'p' indicate antiparallel and parallel orientation, respectively. Note that in some cases the warning "non-cons!" is printed in the mapped assignment, indicating that the mapping is inconsistent, ie. not all scaffolds have the same secondary structure at that particular position. This is usually an indication of an alignment problem.

    It would be nice if secmap could print the mapped assignment in the format required by the Sstrfnm parameter. However, there are two reasons why this is not so. First, it would be damn complex to teach secmap how to handle bifurcating sheets. Second, you are forced to think about the secondary structure assignment while you are constructing the Sstrfnm file. Getting it right is crucial.

    Automatic secondary structure prediction methods

    These can also be used but treat the results with even more suspicion than you would the raw multiple alignments. If in doubt, try several assignments. This "semi-combinatorial" approach is necessary if the protein contains beta-sheets and the topology of the sheets are unknown since DRAGON needs to know which strand is H-bonded to which strand. Note that you may specify weights between 0.0 and 1.0 for each secondary structure element (but not to individual residues) to indicate the uncertainty of the assignment to the program.

    Distance restraints

    The classic source for distance restraints is an NMR experiment. Since DRAGON uses a reduced chain representation internally, distances can be specified only between C-alpha-atoms or non-hydrogen atoms in the side chains. This is usually not a serious problem because we are interested in the correct fold rather than the fine details of side-chain conformation.

    If you know the whereabouts of disulfide bridges in your protein, then these can easily be encoded into distance restraints. Also, if someone has determined the distances between lysines on the surface of the molecule by crosslinking with bifunctional diimidates, his data could be used immediately. Surface residue distances can also be deduced from fluorescence energy transfer experiments. For all distance restraints, a lower and upper bound may be specified, together with a weight between 0 and 1 which tells the program how reliable that restraint is (restraints with weight 0 are ignored completely, with weight 1 are enforced very strictly).

    Accessibility data

    In the absence of any external information, DRAGON attempts to bury hydrophobic residues in the core of the molecule and moves hydrophilic residues towards the surface. This simple heuristics can be complemented by providing a list of residues which are known to be on the surface or in the interior. A good example is when a residue is known to be glycosylated, which means it has to be on the surface. The same logic applies to hydrophobic residues known to be involved in ligand binding. These residues should be specified in the accessibility file (see the Accfnm parameter).

    Homology modelling

    If there is at least one protein in the multiple alignment whose structure is known, DRAGON will attempt to use it as a scaffold for the target. To this end, the scaffold structures have to be provided in PDB format and the PDB filename has to be specified (see the Homfnm parameter). DRAGON will extract the sequences from the PDB file, automatically locate the sequences in the input alignment and then will deduce appropriate distance restraints for the model from the known structures. Care must be taken to ensure that the sequence of the PDB file, which will be deduced from the ATOM records and NOT from the SEQRES records, matches the sequence in the alignment. If DRAGON notifies you that it could not find the corresponding structure, then check the alignment and the PDB file of the known structure.

    Homology-derived restraints will be selected by specifying two additional parameters, Maxdist and Minsepar. The former is a C-alpha distance cutoff: only residue pairs closer than this threshold will provide restraints. The latter chooses a minimal sequential separation between the residue pairs.

    What DRAGON will not do for you

    I really wanted this program to be perfect. However...
    1. DRAGON works with a single protein chain only. No other biopolymers or non-polypeptide molecules can be handled by the program.
    2. DRAGON cannot model fine details because there is no full-atom representation inside it. You can get some reasonable folds, but these have to be refined by a suitable modelling method if atomic details are needed.
    3. It will not be able to model large proteins. The hydrophobic-core building heuristics cannot handle multiple domains or multiple cores at present. You might consider breaking up your big protein and attempt modelling on a domain-by-domain basis.
    4. It will not model membrane proteins. The reason is the core building bit again: it was designed for water-soluble proteins only.
    5. Proteins composed of more than one polypeptide chain (oligomers) cannot be modelled.
    6. Although there is a ``homology-modelling'' option, DRAGON will not give you the high-quality models produced by the best specialised packages. This option is most useful when the multiple alignment was constructed from a threading output and you need a rough fold rather than a high-precision full-atom model.

    Running a simulation

    At the beginning of the simulation, DRAGON constructs an internal representation of the target from the multiple alignment, reads all parameters and data files, smooths the distance bounds, then generates a random C-alpha distance matrix. The seed for the random number generator is specified in the parameter Randseed. When this number is 0, then the current system time is used as the seed and consequently it will be different for every run. If Randseed is nonzero, then two repeated runs will give identical results since the same seed will be used for initialisation. This option is mainly for debugging.
    Overview of the gradual projection cycle.

    In the first stage of the iteration, the restraints are applied to the matrix and it is then embedded into a D>3-dimensional Euclidean space. Various refinements are performed here, then the distance matrix is reconstructed and the process is repeated until a 3-dimensional embedding is achieved. In the second part of the simulation, the 3D structure is refined and the best structure found so far is saved. The embedding is repeated once more if no good structures were found in the last 5 iterations. The 3D iterations are performed Maxiter times, then the final result is saved to disk. You get a one-line report on the reason for termination: normally this says

    "EXIT: no further improvement on 3D reprojection''.

    As with all Distance Geometry-based methods, a good exploration of the conformational space is essential. Once you made sure your parameters make sense and obtained some promising conformations, repeat the runs, starting from a different random distance matrix each time setting the Randseed parameter to 0. (I tend to do at least 20 parallel simulations.) Go through the results carefully, cluster them into families, perform more runs if necessary.

    Modes of Operation

    DRAGON can be run in a variety of modes. The standard way of doing things is to invoke the program in interactive mode, when commands can be typed directly at the keyboard and the output is produced on the terminal. The commands can be collected in ASCII command files and the program be run in command-file mode either by specifying the command file on the command line, or by piping the contents to the standard input of DRAGON, or by invoking the c[ommand] command in interactive mode. Mainly for testing purposes, the program can be started in parameter-run mode when a parameter file has to be specified on the terminal. To complicate matters even further, DRAGON may also be run in parallel.

    Interactive mode

    In this mode, DRAGON accepts commands from the keyboard. When the program is started, the prompt DRAGON> is displayed. The commands can then be typed directly. There is no command history facility. A run can be interrupted by pressing Ctrl-C (SGI machines only?) and the program exits on the q[uit] command.

    Command mode

    A command file can be specified by invoking the program as

    dragon -c command_file [other options...]

    where the file command_file may contain comments, commands and parameter name-value pairs, one per line. If the file cannot be found or opened, then the program starts up in interactive mode. Command files may be nested using the c nested_file command up to 16 levels deep. This is especially useful for performing a large number of simulations when a few parameters have to be changed systematically.

    The contents of command files can also be piped or redirected to DRAGON using the standard UNIX mechanisms. This facility is provided for those who are willing to create a graphical user interface for the program.

    Parameter run mode

    The program can be asked to read a parameter file parameter_file and perform run_no simulations by invoking

    dragon -p parameter_file [-r run_no]

    If the -r option is omitted then just one simulation is performed which is mainly useful for testing that the parameter file is OK before attempting a long simulation session with it.

    Parallel Processing

    For a given set of parameters, it is advisable to repeat the simulation runs several times (each time starting with a different random distance matrix) to sample the set of conformations which satisfy the restraints. These runs can be performed in parallel on a multiprocessor machine using the multiprocess option, or on a network of UNIX workstations running PVM.

    Multiple processes

    When invoked with the -m option,

    dragon -m procno [other options...]

    DRAGON spawns procno child processes when the r[un] command is issued. procno should be larger than 1. These child processes then execute in parallel in the background and perform the simulations. The output is sent to logfiles, one file per simulation. The original parent process handles the communication with you exactly in the same way as in a serial run, and kills the child processes once they finished the calculations. This option, although available on single processors as well, is best used on a multiprocessor machine. There is practically no extra I/O overhead since the internal data structures are fully set up when the parent process spawns the children, and they inherit the data automatically. Note that the code was not optimised for multiprocessor machines, and there is no load balancing.

    PVM support

    PVM, which stands for Parallel Virtual Machine, is free software originally developed at the Oak Ridge National Laboratory. It supports a flexible message-passing protocol on a network of heterogeneous UNIX computers which are linked by PVM so that they appear as a parallel computer to the program linked with the PVM library. If you plan to use PVM, it is a good idea to prepare a hostfile for DRAGON where the working directory is specified as DRAGON's home directory, e.g. for the machine george and user Joe Bloggs you would like to specify something like this:-

    george wd=/usr/people/joebloggs/dragonhome

    otherwise the default data files will not be found. See also the PVM manuals.

    To enable PVM support, invoke DRAGON with the -M flag:-

    dragon -M [other options...]

    This starts the program as the master on the local computer, and then spawns slaves as PVM tasks on all the other nodes in the virtual machine. On multiprocessor nodes, DRAGON checks the number of processors and the average load, and launches as many slaves as necessary to fill the machine up completely :-). E.g. on an 8-processor machine with an average load of 5.0, 3 slaves will be started. The slaves will automatically re-nice themselves to priority 10 to get out of the way as much as possible. The master writes the following information to stdout after each slave was launched:-

    Slave (1/2) [4022e] started on host machine.domain.edu

    this means that slave 1 out of 2 possible slaves with a task ID 4022e was launched on "machine.domain.edu". The number of possible slaves may vary for a given machine, depending on the overall load, but will never exceed the number of available CPU-s.

    The master task will communicate with you and when you request a simulation with the r[un] command, the master will inspect the status of the virtual machine, re-spawns slaves if necessary and then broadcasts the parameters to the slaves. Once they reported back that all data have been received, the master starts assigning the runs to the slaves, one by one. Each run is sent to the first available slave: this way, if your virtual machine is composed of faster and slower computers, the faster machines will do most of the job. Simulation output is redirected to logfiles, however, there is one logfile per slave (as opposed to the multiprocess option where one logfile per run is generated). Logfile names have the form xxxxx@hostname, where xxxxx is a hexadecimal number (the PVM task ID for the slave) and hostname is the name of the actual computer that slave was running on. The master periodically writes to the standard output to inform you about the status of the slaves. If a slave crashes (this should not happen), its job will be re-sent to another slave. DRAGON is smart enough to take notice of any changes you make to the virtual machine. If nodes are added, new DRAGON slaves will soon be spawned on them, upon the removal of nodes the half-finished tasks will be re-started on the remaining nodes as soon as possible.

    Note that if PVM is not running on your computer, then the following uninformative error message may be printed several times to the standard error:-

    libpvm [pid4138]: /tmp/pvmd.761: No such file or directory

    which may be safely ignored. DRAGON automatically detects the situation and will run in single-processor mode.

    A few tips to avoid disappointment with the PVM support option:-

    1. Do read the PVM manual (Version 3).
    2. Specify the working directory for each node in the virtual machine. DRAGON puts all its output in the current directory. This sometimes does not work with PVM 3.3.10., so do not be surprised if your results end up in your home directory.
    3. Signal processing may not work faultlessly under PVM. While the master DRAGON task still catches SIGINT (Ctrl-C) and sends it to the slaves, surprises cannot be excluded.
    4. Both the master and the slaves sleep for 1-second intervals between checking their message queues. This causes occasional slight delays, e.g. the runs don't seem to start immediately after you issue the r[un] command, but in fact a lot of CPU cycles are saved by this trick when the tasks are idle.
    5. Always make sure you don't slow down other people's machines without their permission by running DRAGON under PVM on them.

    Screen Output

    While running, DRAGON entertains you with a considerable amount of screen output. It is a good idea to redirect both the standard output and standard error to a logfile when doing longer runs. If the program is in parallel mode, then the same listing automatically goes to the logfiles (see above). A brief and by no means exhaustive explanation of the various messages follows below, in the order you will encounter them during a normal run.

    Good morning!
    Welcome to DRAGON 4.17.8-n32 [May 15 1998, 11:39:20]
    Algorithms by William R. Taylor & András Aszódi
    Implementation by András Aszódi
    (C) 1993-1997. All rights reserved.

    PVM: supported
    OpenGL graphics: supported

    The program greets you with a version and copyright information listing. Note that the ABI is indicated right after the version number.

    >>Align_: Trying MULTAL format...
    >>Align_: MULTAL parsing successful, seqno=8

    === THE MODEL CHAIN ===

    # No. of sequences = 8, model = Seq. #1, no. of residues = 75
    # Aa Cons Phob Brad Acdist
    1 K 0.0714 0.03 2.67 4.06
    2 S 0.288 0.49 1.94 2.01
    3 P 0.398 0.18 2.21 1.97
    ...
    === THE MODEL CHAIN ===

    # No. of sequences = 8, model = Seq. #1, no. of residues = 75
    --KSPEE--- -LKGIFEKYA AKEGDPNQLS KEELKLLLQT EFPSLLK--- GPSTLDELFE
    ----SEEMIA EFKAAFDMFD --ADGGGDIS TKELGTVMR- MLG---QNPT KEEL-DAIIE
    ---LAKKSNE ELEAIFKILD --QDKSGFIE DEELELFLQ- NFSAGARTLT KTET-ETFLK
    ---MKETDSE MIREAFRVFD --KDGNGVIT AQEFRYFMV- HMG---MQFS EEEV-DEMIK
    ---LSSKSAD DVKNVFAILD --QDRSGFIE EEELKLFLQ- NFSASARALT DAET-KAFLA
    -PSQMEHAME TMMLTFHRFA ---GEKNYLT KEDLRVLMER EFPGFLENQK DPLAVDKIMK
    -----EAMQE ELREAFRLYD --KQGQGFIN VSDLRDILR- ALD---DKLT EDEL-DEMIA
    MCSSLEQALA VLVTTFHKYS CQEGDKFKLS KGEMKELLHK ELPSFVGEKV DEEGLKKLMG

    ELDKNGDGEV SFEEFQVLVK KISQ------ --------
    EVDEDGSGTI DFEEFLVMMV RQMKEDA--- --------
    AGDSDGDGKI GVDEFQKLVK A--------- --------
    EVDVDGDGEI DYEEFVKMMS NQ-------- --------
    AGDSDGDGKI GVEEFQSLVK P--------- --------
    DLDQCRDGKV GFQSFLSLVA GLIIACNDYF VVHMKQKK
    EIDTDGSGTV DFDEFMEMMT G--------- --------
    NLDENSDQQV DFQEYAVFLA LITVMCNDFF QGCPDRP-
     

    # Target Cons Phob Brad Acdist Alignment
    1 K 0.0714 0.03 2.67 4.06 -------M
    2 S 0.288 0.49 1.94 2.01 -----P-C
    3 P 0.398 0.18 2.21 1.97 K----S-S
    4 E 0.78 0.01 2.47 3.4 S-LMLQ-S
    5 E 0.612 0.01 2.47 3.4 PSAKSM-L
    6 L 0.647 2.56 2.6 2.82 EEKESEEE

    First the multiple alignment is read and parsed, then the model chain parameters are listed. Target is the one-letter amino acid code, Cons is the conservation at the given position, Phob is the average hydrophobicity of the position (which is not the same as the hydrophobicity of the amino acid in the master sequence), Brad is the radius of the fake C-beta atom representing the side chain, Acdist is the distance between the C-alpha atom and the centroid of the side chain. Alignment lists all residues in the same position. The program then lists the restraints, accessibilities and secondary structure assignments in the same format as their corresponding input files (see below). Look for error messages here as they indicate input file formatting problems.

    RUN 1 STARTED: Fri 04-Apr-1997 11:55:41
    # Randseed=117
    nonlin11_reg():.......................................Done
    Q=2.359e-02, Stepno=27, t-stat=2.600e-02
    D=-9.793e+00 * H^6.619e-01 + 2.809e+01
    ...

    The Randseed value is the actual long number used for initialising the random number generator. When in parallel mode, this number gets "spiced'' with a combination of the process ID and system time to avoid identical parallel runs. The nonlinear regression is used to calculate the distance distribution for residue pairs with unknown distances; the data are shown for decorative purposes only.

    SMUP: 4
    SMLOW: 3, triangle violations=0

    These two lines show how many cycles were used for upper- and lower-bound restraint smoothing. If the number of violations is larger than zero, you might consider checking your restraint file for mistakes.

    CYCLE: 5 (61%, 42 secs)
    DIST: BD=1.965e-02, NB=1.640e-06, RS=7.094e-03, SC=1.569e+00, AC=1.500e+00
    PROJ: Dim=6, Df=1.003e+00, STR=7.927e-04
    TNGL: 0 (cyc=3)
    EUCL: IN=1.955e-05 ALL=1.408e-03
    EUCL: BD=7.263e-04, NB=2.156e-05, RS=1.137e-01, SC=4.703e+00, AC=1.792e+00

    This is what you see during high-dimensional iteration. The DIST and the second EUCL lines list the scores during distance matrix and Euclidean space adjustments, respectively. BD is the virtual bond score (between first and second neighbours), NB is the non-bond score (bumps between anyone else), RS is the external restraint score, SC is the secondary structure score, AC is the accessibility score. The first EUCL line lists intermediary adjustment scores (to be ignored). The PROJ line shows the new embedding dimension, the isotropic density adjustment factor (which is usually very close to 1.0 except in the first embedding) and the Spectral Gradient "stress'' value between the actual distance matrix of the projected structure and the initial distance matrix (the lower, the better). The TNGL line shows the number of remaining tangles after some detangling cycles. You would like to see 0 here.

    HAND: (secstr) Good:Bad=1:4 (2:32)
    DIST: BD=2.211e-01, NB=7.904e-04, RS=5.051e-02, SC=2.517e+00, AC=7.493e-01
    PROJ: Dim=3, Df=1.019e+00, STR=2.103e-03 , flip
    TNGL: 0 (cyc=0)
    EUCL: IN=2.124e-05 2oSTR=1.061e+00 ALL=2.584e-03
    ** BEST: BD=5.237e-02, NB=0.000e+00, RS=3.433e-02, SC=2.185e+00, AC=7.400e-01

    This is the 3D iteration output. HAND shows the results of the overall handedness checks which are done by inspecting the chirality of the secondary structure elements. If more "bad'' than "good'' chiralities are found, then the structure is "flipped'' after projection (reflected through its centroid) to get the chiralities right. Note that the HAND line looks slightly different during homology modelling because in that case the correct overall chirality is obtained from a comparison to a scaffold structure. The rest of the output is similar to the high-dimensional lines. However, you want to see a **BEST line here, which indicates that a good conformation has been found and saved.

    EXIT: no further improvement on 3D reprojection
    TIME: 58 secs
    END: BD=2.875e-04, NB=3.286e-04, RS=6.409e-04, SC=1.354e+00, AC=8.160e-01, Itno:20=6+14
    SAVE: 3icb_test_1.pdb
    VIOLS: 3icb_test_1.viol

    When the simulation finishes, the EXIT line gives you the reason for termination. The TIME line prints the total time used for this run. The END line lists the scores of the best conformation once again, and the Itno field gives a summary of the cycles used in the high-dimensional and 3D iterations. SAVE and VIOLS list the names of the result and violation files.

    Occasional warning and error messages, indicating the class and method where the problem occurred, plus a very brief and often uninformative description, may also be printed to the standard error during the run. Warnings are preceded by a question mark `?' and are quite likely to occur at the beginning, if one of the input files is incorrectly formatted. Sometimes you get one or more warnings like this before a projection step:-

    ? centre_dist(): Cdist2[32]=-1.164e+01

    These indicate non-metricity in the distance matrix and can safely be ignored unless there are too many of them. Another quite innocent warning is sometimes printed by the Spectral Gradient optimisation:-

    ? Specgrad_::iterate(Maxiter=30, Eps=2.000e-02): No convergence
    ? Steric_::adjust_xyz(SPECGRAD): no convergence

    Do not worry, DRAGON switches to another, more robust optimisation when Spectral Gradient does not converge.

    What you definitely do not want to see is a fatal error message, preceded by an exclamation mark `!'. Theoretically, they should never occur. If they do, then a coredump usually follows, and even if you get a model in the end, it is best to throw it away.


    After the Simulation

    The output from DRAGON is just the skeleton of your molecule: a C-alpha backbone and dummy C-beta atoms corresponding to the centroids of the side chains. What's more, every run will give you (slightly) different conformations. What to do now? First of all, have a look at your raw models with a suitable visualisation program. The best pattern recognition machine known to us is still the human visual system. Throw away mercilessly any models which look bad. Check tangles carefully.

    Violation files

    These are generated for each output file at the end of the run and contain a listing of constraint violations which fall into three categories. BOND restraints are the distances between first and second-neighbour C-alpha atoms, NONBD restraints are minimal van der Waals distances. External restraint violations (coming from NMR or homology modelling) belong to the RESTR category, deviations from the ideal secondary structure are shown as SECSTR. The violation files look like this:-

    # Atom pair Type Actual Ideal (Strict) Rel.viol Error
    CA[ 42]: CA[ 41] BOND 3.40 < 3.80 (2.97) 0.31 10.4 %

    where the relative violation column is the error multiplied by the strictness and therefore could look frightening for C-alpha:C-alpha violations which have a high weighting. To calm your nerves, read the last column only. However, you should not see any BOND violations and only a few NBOND ones. If they do occur, then probably some of your external restraints were inconsistent or you have found a horrible new bug. RESTR and SECSTR violations are more common. Check these carefully, too: the deviations from ideal secondary structures may be OK if the model otherwise looks reasonable. Innocent violations below 5 % are not listed at all.

    Score ranking: the rank program

    The final scores (which are the weighted sums of the relative violations in these categories) are stored as REMARKs in the output PDB files and can be used to filter out unacceptable conformations. An auxiliary program, rank is available which can read these remarks and then ranks the model structures according to various scoring criteria. Usage:-

    rank [-bnr] DRAGON_PDB_file(s)

    The flags -b, -n, -r specify that the structures are to be sorted according to their bond, non-bond or restraint scores, respectively. The score flags may be combined, in which case all specified scores will be used in the ranking process.

    Clustering: the clumsy program

    If the raw structures seem to be satisfactory, cluster them. I wrote a program called clumsy for that purpose, which is distributed together with DRAGON, but you are well advised to try other programs if you can since there are a great many clustering algorithms to choose from and clumsy may not be the ideal for your purposes. clumsy performs a pairwise rigid-body superposition on all of its input structures, constructs an RMS distance matrix, and then performs hierarchical clustering using the average similarity criterion to merge low-level clusters. Here is how to use it:-

    clumsy [-as] [-w window_len] [-c smooth_cycno] [-o output] PDB_files...

    The options have the following meaning: -a causes all atoms to be used in the comparison (the default is C-alphas only), -s performs smoothing on the C-alpha trace with a window length and smooth cycle number specified by -w and -c, respectively. -o saves the average structure of the top cluster to the specified output file. The argument PDB files must have identical sequences and only the first chain from each file is used in the comparison. The program prints a dendrogram to the standard output with the coordinate RMS deviations between the clusters.

    Clustering can detect outliers or the presence of fold families which satisfy the restraints equally well. Generate average structures for the clusters you like, or just pick one structure with the best scores from each cluster and use these as representative conformations.

    Preparing all-atom models: the sidech program

    Finally, put flesh onto the bones: add the missing atoms to the DRAGON structures. This is not a straightforward operation and requires sound judgment. Good molecular modelling packages will do the trick for you, and then you can apply your favourite energy minimisation method to the models (if you believe in the usefulness of energy minimisations, of course).

    Start with building the main-chain from the C-alpha trace. We used the catomain program in Willie Taylor's lab, kindly provided by M. Levitt: this software is not included in the DRAGON suite. Once you have the main-chain complete with N, C and O, then you can use my sidech program to build partial sidechains using the original multiple alignment and the known structures if you did homology modelling. Usage:-

    sidech alignment mainchain homstruct outfile

    where alignment is the original multiple alignment you used for the model construction, mainchain is the PDB file with the main-chain atoms of the model, homstruct contains the scaffold structures used for homology modelling (see Homfnm). The almost-all-atom model with the partial sidechains will be written to outfile. CHARMm can complete these sidechains, other modelling programs can perhaps do the whole story so that you might not need sidech at all.


    Command Reference

    All commands are lowercase strings and the trailing characters enclosed in square brackets can be omitted (currently only the first character of the command name is significant). Command names and optional arguments are separated by whitespace. Commands can be specified in ASCII text files where only one command per line is allowed, or at the "DRAGON>'' prompt in interactive mode. Lines in command files beginning with the character # are interpreted as comments and ignored. Commands can also be piped to the standard input of DRAGON in which case care should be taken to separate them by newlines. Here are the commands in alphabetical order:-

    c[ommand] command_file
    Executes the commands in command_file. Command file calls can be nested up to the maximal depth of 16. If command_file is omitted then the program enters interactive mode. The number of ">'' characters following the "DRAGON'' prompt indicates the current call depth.

    d[efault]
    Resets all parameters to their default values.

    h[elp]
    Prints a short help on all available commands. This command works in interactive mode only and is ignored when issued from within a command file.

    l[ist] Param
    Lists a short description and the value of parameter Param to the standard output. If Param is omitted, then all parameters are listed.

    o[s]
    Invokes an OS shell (your default). You can return to DRAGON by typing "exit" on the shell command line.

    p[aram] parameter_file
    Reads the parameter specifications in parameter_file. It is an error if the file cannot be opened. For the parameter description format, refer to the Parameters section below. A word of warning: parameters not specified in parameter_file will retain their previous values, sometimes causing confusion. You could either specify all parameters in your files, or you could issue a d[efault] command prior to reading in new parameters.

    q[uit]
    Quits DRAGON. If invoked in a nested command file, then execution of the file is terminated and control will be returned to the caller. DRAGON exits only if q[uit] was issued at the topmost level. Since execution automatically terminates at the end of command files anyway, q[uit] is mainly useful in interactive mode. The program politely asks for confirmation before exiting.

    r[un] repetition
    Performs the simulation repetition times using the current parameters but starting with a different random distance matrix each time. If repetition is omitted, then one simulation is carried out. Simulations can be interrupted by typing Ctrl-C. (Note: this feature is not supported when compiled with GCC.)

    s[ave] parameter_file
    Saves the parameters to parameter_file or to the standard output if parameter_file is omitted.


    Parameters

    Parameters can be specified in one of the following ways: at the DRAGON> prompt in interactive mode, intermixed with commands in a command file, or in a dedicated parameter file. The name of the parameters always start with uppercase letters (to distinguish them from the commands) and must be specified literally. The format of the parameter specification is always

    Param value

    where Param, the parameter's name and its value are separated by whitespaces. Invalid parameter names and malformed parameter specifications are ignored silently.

    There are two kinds of parameters: numeric parameters and filename parameters. The latter specify various ASCII files which either describe your modelling problem (such as the multiple alignment file or the secondary structure assignment) or they hold generic data necessary for the operation of DRAGON. These data files live in the subdirectory pointed to by the $DRAGON_DATA environment variable (usually the dragon4/data subdirectory).

    For your convenience, all parameters have default values which will be used if value is missing or does not make any sense to the program. The values contained in the default data files are also hardwired into the program so it is possible to perform a run even if the files are missing or inaccessible. In addition to their default values, numeric parameters have a permitted range as well. If the value specified is outside the range, it will be adjusted silently to the closest upper (or lower) limit. All distance measurements are given in Å units.

    Parameter file format

    The parameter file is a simple ASCII text file and contains one parameter specification per line in the format described above. Empty lines are ignored and lines beginning with the character # are interpreted as comments. Comment lines and parameter specification lines can be freely mixed and the order of parameters is irrelevant. What's more, you don't have to specify all parameters, those which are not mentioned in the parameter file will simply retain their previous value.

    Parameter reference

    Below follows the complete list of available parameters in alphabetical order. For those parameters whose values are file names, the format of the corresponding file is also given. All these files are simple ASCII text files, where empty lines are ignored and lines beginning with # are treated as comments.

    Accfnm: Residue accessibility

    Format: Accfnm filename
    Default: none

    Residues which are known to be either on the surface or buried inside may be specified in this file. For these residues, the normal accessibility checks are suspended and DRAGON forces them either to the surface or to the interior.

    The accessibility file consists of lines of the following format:-

    access_code resno [ resno... ]

    where access_code is the letter s or S for surface residues, b or B for buried residues, followed by a whitespace-separated list of residue numbers (resno). More than one line of either kind may be specified in arbitrary order. DRAGON filters out those residues which do not fit into the target molecule or which were specified both as surface and buried and prints appropriate warnings.

    Adistfnm: Average atom distances

    Format: Adistfnm filename
    Default: $DRAGON_DATA/DEFAULT.acd

    Specifies the average distances of side-chain atoms from the C-alpha atoms and from the centroid of the side chain. The default file contains data derived from the Ponder/Richards rotamer library. The data in this file are used to convert interatomic distance restraints into restraints between C-alpha atoms and/or side-chain centroids to match the reduced representation of residues inside DRAGON. Since it is quite painful to construct this file, I do not give the format here. For all practical purposes the default values should be adequate.

    Alignfnm: the multiple alignment file

    Format: Alnfnm filename
    Default: $DRAGON_DATA/DEFAULT.aln

    This is perhaps the most important parameter because you specify the sequence of your protein to be modelled as one of the sequences contained in the multiple alignment (see the Masterno parameter for details). The default alignment file is provided only as an example. Alignments may be specified in the GCG format (also known as multiple sequence format or MSF), or in MULTAL vertical format (which actually has a few variants), or in PIR format.

    The GCG format acceptable to DRAGON is more relaxed than the original. Here is the specification:-

    <...any number of lines containing anything...>
    Name: first_seq_name Len: XXX
    Name: second_seq_name Len: YYY
    ...
    Name: last_seq_name Len: ZZZ
    <...any number of lines containing anything...>
    first_seq_name ..ALIG nM---eNT ...
    second_seq_name ..ALIG nM...DNT ...
    ...
    last_seq_name ..ALIG nM-X-eNT...
    <... any number of lines containing anything ...>
    first_seq_name ..ALIG nM---eNT ...
    second_seq_name ..ALIG nM...DNT ...
    ...
    last_seq_name ..ALIG nM-X-eNT...
    <... any number of lines containing anything ...>

    As you can see, both "." and "-" are acceptable as gap characters, whitespaces are ignored and the amino acid codes may be lower- or uppercase. Each line can be at least as long as the maximal alignment length (2048 chars) if I counted the bytes correctly. The length specifications (XXX, YYY, ... ,ZZZ) need not be equal: DRAGON will use the largest as the alignment length.

    The only snag with the MULTAL format is that there is no such thing as the MULTAL format. There are subtle differences in the first few lines where the number and names of sequences are specified. Currently the following variants are recognised:-

    1. DRAGON format: The first non-comment line of the file should contain the line:
    2. Seqno number_of_sequences

      and the sequence names are in general unspecified.

    3. MSAP format: The first non-comment lines of the file should look like this:
    4. Block 0
      number_of_sequences seqs
      USER>BS_HYDRO = Bean soup hydrolase
      USER>NAC_DX = Nicotine deoxygenase
      USER>ANOTH_SEQ = another sequence

      where the sequence abbreviations after the USER> keyword can be anything. This is what comes out from Willie Taylor's multiple sequence/structure alignment program MSAP. Note that currently there is a bug in some versions of MSAP which sometimes causes the loss of the last few amino acids from the aligned sequences.

    5. CAMELEON format:
    6. block 1 = number_of_sequences seqs
      -----USER>BS_HYDRO = Bean soup hydrolase
      -----USER>NAC_DX = Nicotine deoxygenase
      -----USER>ANOTH_SEQ = another sequence

      This is the "MULTAL output" of CAMELEON, the commercial implementation of MULTAL (by Oxford Molecular).

    To compensate for the confusion, the input method is not too pedantic: the capitalization and the number after the Block or block keyword is ignored and sequence names are not obligatory.

    Once we got past this mess, the rest of the non-comment lines are alignment positions containing a string of 1-letter amino acid codes (upper- or lowercase) or the gap character "-". Warnings are printed if invalid characters are encountered and they will be replaced by "X" (meaning any amino acid). Here is a sample alignment file:-

    Seqno 6
    -AAa-G
    LLLIIL
    RRE--K
    ...

    The PIR format is relatively simple. All aligned sequences are listed after each other in PIR format, with gaps inserted in the appropriate places. The first line should contain the ">P1;" thing and the sequence name, the second line is a description which is ignored (but must be present), then follow the sequence lines, terminated with an asterisk. Again, lowercase letters are allowed, gaps can be '-' or '.' characters. Comment lines beginning with "#" are also allowed. If some of the sequences happen to have different aligned lengths, then you get a warning and the ends of the offending sequences will be padded up by gaps. Here is an example:-

    >P1;BS_HYDRO
    Bean soup hydrolase
    LFSR--GtHrS--QWETPY
    THRSRLLK--*

    >P1;NAC_DX
    Nicotine deoxygenase
    TTLPTR-VVMFhASLK
    LLYKHLDNNLaLA---WQD*
    .....

    I despise hard-coded limits. However, there is an upper limit of 256 sequences and 2048 positions built into the alignment module. In practice you should refrain from modelling proteins larger than about 300 residues, mainly because DRAGON cannot yet handle multidomain structures.

    Density: Residue density

    Format: Density float
    Default: 0.00636
    Range: 0.001 ... 0.012

    This parameter specifies the number of C-alpha atoms per cubic Å. The default value is an average calculated from a non-homologous set of well-resolved cytosolic proteins which is is surprisingly constant: you may use the default value with confidence if no better guess is available.

    Evfract: fraction of retained eigenvalues

    Format: Evfract float
    Default: 0.999
    Range: 0.00 ... 1.00

    This parameter specifies the fraction of eigenvalues to be retained in each stage of the gradual projection. A low value means larger jumps in dimensionality towards 3D but embedding accuracy is reduced.

    This parameter affects the run time needed for the first part of the simulations when DRAGON wanders around in high-dimensional spaces. See Maxiter to get an idea how to change the speed and precision in the second stage of the simulations.

    Graph: toggle graphical output

    Format: Graph integer
    Default: 0
    Range: 0,1

    This option is ignored on architectures not supporting OpenGL and in non-interactive mode. When set to 1, then the actual distance matrices before ("Dist") and after ("Eucl") the embedding are displayed in fancy graphics windows, and the 3D iterations can be monitored in a little molecular movie. This option slows down the calculations slightly and therefore should be switched off when not needed (but it is very nice to watch if you are not in a hurry).

    Screen snapshot of DRAGON running with graphical output enabled.

    Homfnm: homologous structures

    Format: Homfnm filename
    Default: none

    This file, if specified, contains the 3D structure of one or more of the sequences in the alignment in PDB format. Only monomeric structures are considered: they may be separated by TER cards or enclosed between MODEL/ENDMDL cards. Chain identifier characters are ignored for the ATOM cards. It is sufficient to provide the C-alpha coordinates only since all other PDB information will be ignored.

    The sequences belonging to the structures are automatically deduced from the ATOM cards (the SEQRES cards are ignored!) and then the structures are used as scaffolds for homology modelling. Structures whose sequences cannot be found in the alignment will be ignored. A common problem is to submit slightly different sequences to the multiple alignment program but this results in disaster since DRAGON demands an exact string match. If you wanted to do homology modelling and DRAGON tells you that no homology-derived restraints were generated, then check the sequences carefully.

    Masterno: target sequence selection

    Format: Masterno integer
    Default: 0
    Range: all non-negative integers

    Specifies which sequence in the multiple alignment should serve as the "master sequence'', i.e. the model chain's sequence. If set to 0 (the default), then the consensus sequence of the alignment will be the model sequence.

    Maxdist: maximal length of homology-derived restraints

    Format: Maxdist float
    Default: 5.0 Å
    Range: >= 5.0 Å

    This parameter specifies the maximal C-alpha distance between two residues in the known structure(s) which are used as homology-derived restraints. The default value roughly corresponds to the radius of the first coordination sphere in protein interiors. Increased Maxdist values give better accuracy but the larger number of restraints might result in slightly longer simulation times. This parameter is ignored if Homfnm is not specified (no homology modelling).

    Maxiter: 3D refinement iterations

    Format: Maxiter integer
    Default: 40
    Range: 1 ... 500

    DRAGON handles 3D iterations in a special way because you are interested in 3D models only. Untangled 3D structures are saved during the 3D iterations if their scores are better than those of the previously saved "best'' structure. If no acceptable structures are found, then DRAGON repeats the 3D embedding step in every five or so cycles to get into a new local minimum. If no acceptable structures were found in Maxiter iterations, then DRAGON repeats the whole simulation, starting afresh from high dimensions.

    The choice of Maxiter affects the CPU time requirements to a large extent. The default value is a good starting point but you should experiment with different values to get a good tradeoff between model quality and simulation time. In general, larger structures would need higher Maxiter values. Note that in my experience it is probably a better idea to run more rough simulations rather than refining a few to the extreme.

    Minchange: minimal score change

    Format: Minchange float
    Default: 0.0
    Range: all non-negative floating point numbers

    The minimal relative change of the steric violation and distance scores between two iterations. Serves as an exit criterion.

    Minscore: minimal score

    Format: Minscore float
    Default: 0.0
    Range: all non-negative floating point numbers

    The minimal value of the steric violation and distance scores. The simulation exits when the scores fall below this value.

    Minsepar: minimal sequential separation

    Format: Minsepar integer
    Default: 2
    Range: all integers >=2

    The minimal sequential separation between two residues for which a homology restraint will be generated. This parameter has to be larger than or equal to 2 (the default value) and will be ignored when no homology file is specified. It does not make much sense to vary this parameter and probably will not be supported in the next release.

    Outfnm: output file name

    Format: Outfnm [dir_path/]filename
    Default: DRAGON_OUT

    Specifies the name from which the result filenames and various logfile names are derived. If the optional directory path dir_path is given, then the program attempts to create the necessary subdirectories in the path if they do not exist already. Should the directory creation fail for whatever reason, then the output files are created in the current working directory. Note that environment variables like "$HOME" and other shell-dependent things like "~" will NOT be expanded.

    The best simulation result is saved in PDB format, listing the C-alpha atoms and the fake sidechain centroids as C-beta atoms, as well as the sequence and secondary structure assignment. The result of the k-th run will be saved as "filename_k.pdb". If a valid 3D embedding was found, then a restraint violation file will also be generated with the name "filename_k.viol''. In rare circumstances it might happen that no untangled models could be found: in this case the last horrible structure is saved anyway under the name "filename_TEMPORARY_k.pdb". A desperate attempt is also made to untangle the structure and the result will be saved as "filename_UNTANGLED_k.pdb". These files are saved only to frighten you and should be discarded.
    In parallel mode (-m option) the child processes generate log files for each run called "filename_k.log''.

    Phobfnm: residue hydrophobicity file

    Format: Phobfnm filename
    Default: $DRAGON_DATA/DEFAULT.pho

    Specifies the amino acid hydrophobicity values. No need to be changed. Every non-comment line in the file lists an amino acid (with one-letter code) and its hydrophobicity value separated with whitespaces like this:-

    # Membrane hydrophobicity data
    A 1.73
    B 0.02
    ...
    Z 0.02

    Randseed: seed for the random number generator

    Format: Randseed integer
    Default: 0
    Range: all non-negative integers

    This number serves as the seed for the random number generator used to fill up the initial distance matrix. If it is 0 (the default), then the random number generator will be seeded with the system time, otherwise with the specified integer. If multiple runs are specified (with the -r command-line option or via the r[un] command) then the program assumes that Randseed=0.

    Restrfnm: distance restraint file name

    Format: Restrfnm filename
    Default: none

    Contains the list of external distance restraints. Restraints may be specified between C-alpha, side-chain atoms or a pseudo-atom called "SCC" (side chain centroid) in the form of lower/upper-limit pairs with a "strictness" value. Atom names should follow the PDB conventions. No file is specified as the default, meaning that no external distance restraints are available. The format of a line in the file is:-

    res1 res2 lowlim uplim strict atom1 atom2

    where res1, res2 are the residue numbers (>=1), lowlim and uplim are the lower and upper distance limits in Å units, 0.0<=strict<=1.0 is the strictness value reflecting the reliability of the restraint (0.0 means totally unreliable, 1.0 is absolutely certain), and atom1, atom2 are the atoms linked by the restraint. Restraints within residues can be specified if res1=res2. Here is an example:-

    # Example restraint file
    6 9 4.89 5.69 0.986 CA CA
    12 15 4.89 7.11 0.627 SCC SCC
    15 17 3.83 4.15 0.635 CB SG
    ...

    Simfnm: similarity matrix file

    Format: Simfnm filename
    Default: $DRAGON_DATA/DEFAULT.sim

    Specifies the amino acid similarity matrix. The default file contains Dayhoff's PAM250 matrix. A variety of other similarity matrices are also available in $DRAGON_DATA*.sim files. You can also specify your own, here is the format:-

    # Mutation Data Matrix (250 PAMs) DRAGON 4.x default
    ARNDCQEGHILKMFPSTWYVBZX
    2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -4 1 1 1 -6 -3 0 0 0 0
    -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 -1 0 0
    ...

    The first non-comment line should be a string that specifies the order of the amino acids in the columns and the rows. There must be exactly as many rows and columns as the number of characters in the order string. The matrix elements are floating-point values separated by whitespaces.

    Speceps: precision for Spectral Gradient iterations

    Format: Speceps float
    Default: 0.02
    Range: 0.0001...0.1

    Spectral Gradient is an iterative optimisation method used to move a set of points in Euclidean space so that their distances correspond to a prescribed distance matrix (see Wells et al, J. Mol. Struct. 308: 263-271 (1994) for a detailed description). This parameter sets the precision for the iteration: when the relative "stress" change is less than Speceps, then the iteration is terminated. Lower values mean more iterations. See also Speciter below.

    Speciter: maximal number of Spectral Gradient iterations

    Format: Speciter integer
    Default: 30
    Range: 10...100

    This parameter controls the maximal number of Spectral Gradient iterations used in Euclidean adjustments. Sometimes the method does not converge, in these cases DRAGON performs a less elegant but more robust steepest descent-like optimisation.

    Sstrfnm: secondary structure file

    Format: Sstrfnm filename
    Default: none

    This file holds the secondary structure assignments. Currently 3/10-, alpha- and pi-helices and beta-sheets are implemented. You must supply the alignment information for strands in a beta-sheet. Bifurcated sheets may be specified as overlapping "normal" sheets following the PDB convention. A warning is issued when overlapping sheets are encountered: all other overlapping secondary structure elements are ignored. An optional "strictness" value between 0.0 and 1.0 may be specified for each secondary structure element in the file which regulates the extent to which ideal secondary structure is enforced on the model. 1.0 corresponds to full adjustment, 0.0 means that the ideal geometry is not enforced at all. In some cases it is worthwhile to specify a medium strictness, especially for long helices (which are sometimes bent, as opposed to the ideal straight helices generated by DRAGON) and for curved sheets.

    DRAGON lists the accepted secondary structure specification to standard output prior to the runs which can be used to verify that you supplied a correct assignment. No file is specified as default but if you fail to provide one, then you are on your own: DRAGON cannot predict secondary structure yet and since the detangling relies on the assignment, the results will be of dubious value.

    A few words about the file format. Secondary structure elements may be specified in any order, with comment lines in between. Helix specifications have the format:-

    helixtype beg end [strict]

    where helixtype=ALPHA or HELIX for alpha-helices, HX310 for 3/10-helices or HXPI for pi-helices, beginning at residue beg and ending at end, with the optional strict value between 0.0 and 1.0.

    The sheet description spans multiple lines. It starts with a line that contains the keyword SHEET, optionally followed by a strictness value for the whole sheet. Then comes the first strand with the format:-

    STRAND beg end

    The rest of the strands in the sheet are described like this:-

    STRAND beg end sense this_pos prev_pos

    where sense=PAR or ANTI indicates whether the current strand is parallel or anti-parallel with respect to the previous strand. The last two numbers describe the phasing of the strand: the residue indicated by this_pos is hydrogen-bonded to the residue prev_pos on the previous strand. The sheet description ends with a line containing the keyword END. Easy, isn't it? Here is a full example:-

    # Example secstr file
    # an alpha-helix
    ALPHA 12 25
    # another alpha-helix
    HELIX 39 45
    # a 3/10 helix
    HX310 69 76
    # a pi-helix
    HXPI 104 116
    # helix we're not sure about (strictness 0.5)
    ALPHA 156 172 0.5
    # the main sheet has a bulge at 30 and the last strand is bifurcated
    SHEET
    STRAND 27 29
    STRAND 1 7 PAR 1 27
    STRAND 47 54 PAR 47 1
    STRAND 86 92 PAR 86 48
    STRAND 119 121 PAR 119 86
    END
    # note that most strand descriptions are just repeated
    SHEET
    STRAND 27 29
    STRAND 1 7 PAR 1 27
    STRAND 47 54 PAR 47 1
    STRAND 86 92 PAR 86 48
    STRAND 145 147 PAR 145 90
    END
    # little extra antiparallel sheet at strictness=0.7
    SHEET 0.7
    STRAND 137 138
    STRAND 141 142 ANTI 142 137
    STRAND 124 125 ANTI 124 142
    END

    You may also consult the PDB Format guide because the sheet representation in this file closely follows the PDB conventions. Be careful when specifying beta-barrels, though: I haven't tried that yet. The PDB convention of specifying the first strand as the last would probably not work.

    Let me give you some tactical advice about secondary structure assignment. It is relatively straightforward to write the assignment file if you perform homology modelling: all you have to do is to map the secondary structure elements in the template structure(s) onto the target sequence. If you attempt ab initio modelling then the assignments usually come from secondary structure predictions. Since it is not possible to assign a secondary structure strictness value to every residue based on the probabilities generated by most prediction programs, the workaround is to use your judgment and assign an average strictness to the secondary structure elements. The conformation adjustment routine does not like very short elements, i. e. 3-residue "helices" or 2-residue strands: if you need these, then you are probably better off by supplying some distance restraints in the Restrfnm restraint file.

    Beta-sheets pose another problem. Prediction programs usually generate the strands only but DRAGON needs the sheet topology as well. In most cases you have to generate a few plausible topologies by hand and then compare the results obtained from runs done with each assignment. This approach is feasible for small sheets only.

    Tangiter: detangling iterations

    Format: Tangiter integer
    Default: 5
    Range: 1...100

    Maximal number of detangling iterations. The detangling tries to get rid of the tangled conformations which are an annoying artefact of Distance Geometry projections. The default iteration number is probably a safe compromise between speed and efficiency. Note that detangling cannot be carried out if no secondary structure was specified.

    Volfnm: amino acid side chain volume file

    Format: Volfnm filename
    Default: $DRAGON_DATA/DEFAULT.vol

    Specifies the average amino acid side-chain volumes: No need to be changed. The default file looks like this:-

    # Amino acid volume data for DRAGON 4.x (default)
    A 22.7
    B 50.2
    ...


    References

    The following publications by us contain a detailed description of the theory behind DRAGON, together with some test cases. Please refer to these if you are interested in how the program works. Additionally, if you have achieved a major breakthrough using DRAGON, please share your enjoyment with us and cite some of these publications in your paper.

    Aszódi, A. and Taylor, W. R. (1994):
    Folding polypeptide alpha-carbon backbones by distance geometry methods.
    Biopolymers 34, 489-505.

    Taylor, W. R. and Aszódi, A. (1994):
    Building protein folds using distance geometry: Towards a general modelling and prediction method.
    In: Merz, K. M., Jr. and LeGrand, S. M. (eds): The Protein Folding Problem and Tertiary Structure Prediction, 165-192.
    Birkhäuser, Boston. (Book chapter)

    Aszódi, A. and Taylor, W. R. (1994):
    Secondary structure formation in model polypeptide chains.
    Protein Engng. 7, 633-644.

    Aszódi, A., Gradwell, M. J. and Taylor, W. R. (1995):
    Global fold determination from a small number of distance restraints.
    J. Mol. Biol. 251, 308-326.

    Aszódi, A. and Taylor, W. R. (1995):
    Estimating polypeptide alpha-carbon distances from multiple sequence alignments.
    J. Math. Chem. 17, 167-184.

    Aszódi, A. and Taylor, W. R. (1996):
    Homology modelling by distance geometry.
    Folding & Design 1, 325-334.

    Aszódi, A. and Taylor, W. R. (1997):
    Hierarchic inertial projection: A fast distance matrix embedding algorithm.
    Computers & Chemistry 21, 13-23.