In this tutorial, we will perform a molecular dynamics simulation of a protein (Esculentin-1A, from Pelophylax esculentus) using Gromacs version 2018.1.
Apart from the actual molecular dynamics program, Gromacs contains a long list of specific tools for the preparation, analysis and post-processing of MD simulations. Since it is open source software, it is highly advisable to install Gromacs in your local machine to perform these auxiliary operations. |
Schematic secondary structure of the protein we will be simulating.
We start by downloading the pdb file from the RCSB Protein Data Bank. This file format, designed for peptides, contains the aminoacid sequence of esculentin and additional structural information. However, this format differs from Gromacs structure inputs. We will use the Gromacs tool pdb2gmx to convert the file to Gromacs-readable format.
In our local machine, we move to a directory containing the pdb file and type:
gmx pdb2gmx -f 2n6m.pdb -o esculentin.gro -p esculentin.top -ignh |
Since we haven't specified the force field or water model, the program will demand these interactively. We will choose GROMOS96 54a7 as the force field and the less accurate but inexpensive SPC water model. We have also instructed pdb2gmx to ignore hydrogen atoms in the pdb file (-ignh option). More information about gmx pdb2gmx here.
This command will generate three files in the current directory.
ESCULENTIN-1A 201 1GLY N 1 0.133 0.000 0.000 1GLY H1 2 0.118 -0.094 0.030 1GLY H2 3 0.185 0.049 0.070 1GLY H3 4 0.045 0.045 -0.014 1GLY CA 5 0.207 0.000 -0.125 1GLY C 6 0.234 0.140 -0.176 1GLY O 7 0.196 0.174 -0.288 2ILE N 8 0.300 0.221 -0.094 2ILE H 9 0.328 0.188 -0.004 2ILE CA 10 0.332 0.359 -0.132 (...) 21GLY C 199 -1.436 0.231 -2.356 21GLY O1 200 -1.539 0.169 -2.328 21GLY O2 201 -1.386 0.228 -2.482 0.10000 0.10000 0.10000 |
The structure file contains information about the atoms that conform the system we are simulating.
More information about the .gro file can be found here.
[ position_restraints ] 1 1 1000 1000 1000 5 1 1000 1000 1000 6 1 1000 1000 1000 7 1 1000 1000 1000 (...) |
Include topology files are auxiliary files containing additional topology information. They can be called into the main topology file via the #include mechanism, thus allowing for a modular approach to topology definitions (for instance, for special solvent parameters). File format is identical to .top files.
In our case, pdb2gmx generates a .itp file containing position restraints for all heavy (i.e. non-H) atoms in our protein. This will become handy later.
Now we will build the simulation box and add water molecules to the system. We use the Gromacs program editconf:
gmx editconf -f esculentin.gro -o esculenbox.gro -box 5 5 5 |
This will generate a cubic box, 5 nm in length, and center the protein inside it. The new structure file is esculenbox.gro. More information about gmx editconf here.
Next, we will add water molecules to solvate the protein. We use the Gromacs program solvate:
gmx solvate -cp esculenbox.gro -o solvated.gro -p esculentin.top |
This will add about 4000 water molecules to our box. The final structure file (including both the solvent and the solute) is solvated.gro, and the topology file has been updated to include the solvent information - the older version is backed up as #esculentin.top.1#. More information about gmx solvate here.
So far, we have correctly set up the molecular structures in our system, the simulation box and the interaction parameters that will determine the dynamics of the system. One last step before running the simulation is to configure the simulation itself - length, step time, temperature/pressure coupling, etc. We do this via another input file, the .mdp file.
Since default values for most options are fine for this tutorial, we will use a simple file, options.mdp.
integrator = md nsteps = 50000000 nstxout = 50000 coulombtype = PME fourierspacing = 0.15 tcoupl = v-rescale tau-t = 0.2 0.2 ref-t = 298.15 298.15 tc-grps = Protein Non-protein pcoupl = berendsen compressibility = 4.5e-5 tau-p = 0.2 ref-p = 1.0 |
In summary, we will run a molecular dynamics simulation, using a leap-frog integrator, with a time step of 0.001 ps, for a total simulation length of 50 ns, and will handle electrostatics using a Particle-Mesh Ewald calculation. Temperature and pressure will be fixed at 298 K, 1 bar.
Please consult the extensive Gromacs documentation for details on the implementation.
Finally, all these files (.gro, .top, .mdp) need to be transformed into a run input file (.tpr). The .tpr is a single file containing all the relevant information (about the system, about the force field, and about the simulation) to run. We use the Gromacs program grompp to generate it:
gmx grompp -f options.mdp -c solvated.gro -p esculentin.top -o esculentin.tpr |
The resulting esculentin.tpr input file is the only file we need to feed to Gromacs to run our simulation.
In order to submit the job, we need to compose a shell script including module preparation and the Gaussian command. We will use the script esculentin.slm:
#!/bin/bash #SBATCH -J esculentin #SBATCH -e esculentin.err #SBATCH -o esculentin.out #SBATCH --ntasks=24 modulue purge module load apps/gromacs/2018.1 INPUT_DIR=${PWD} OUTPUT_DIR=${PWD} date cd $TMPDIR cp -r $INPUT_DIR $TMPDIR mpirun -np 24 gmx_mpi mdrun -s esculentin.trp cp ./* $OUTPUT_DIR date |
A brief reminder of LSF files structure:
We need to upload the LSF script and run input file onto the HPC storage. From our local directory holding the files, we will use this command:
scp -P 2122 esculentin.lsf esculentin.tpr youruser@hpc.csuc.cat:/home/youruser/ |
If you want to send the files to a different path, just make sure you create the appropriate directory in advance. This path should match the cd instruction in your shell script.
scp -P 2122 [input file(s)] youruser@hpc.csuc.cat:/home/youruser/your_chosen_path/ |
To launch the job, we remotely log in to the HPC facilities. From any terminal:
ssh -p2122 youruser@hpc.csuc.cat |
We will arrive at our /home/youruser/ directory. If we have uploaded the files to a different directory, we should move to it.
cd /your_chosen_path/ |
To launch the job, we send the esculentin.slm script to sbatch:
sbatch esculentin.slm |
SLURM will automatically send the job to a queue that meets its requirements. If we want to submit it to a specific queue, we use -p. For instance:
sbatch esculentin.slm -p std |
For an overview of the queues and their limitations:
sinfo |
To check on the job status:
squeue |
The output of this command looks like this:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1933 std nematic user_name R 1:40:15 1 pirineus1 |
We can cancel a pending or running job:
scancel yourJOBID |
We can add the line "#SBATCH --mail-user=user@mail.com" to our slm file to receive an email notification when the job is complete. |
To retrieve the output files, we run the relevant commands in terminal (from the local directory where we want to download the files):
scp -P 2122 youruser@hpc.csuc.cat:/your_chosen_path/md.log . scp -P 2122 youruser@hpc.csuc.cat:/your_chosen_path/traj.trr . scp -P 2122 youruser@hpc.csuc.cat:/your_chosen_path/topol.tpr . scp -P 2122 youruser@hpc.csuc.cat:/your_chosen_path/ener.edr . |
The protocol sftp provides a friendlier alternative that allows real time file management over a ssh connection.
Each of these files contains specific information about the run.
Gromacs incorporates a large number of programs to analyse the simulation. As an example, we will check the α-helix structure presented by esculentin.
First, we need to transform the trajectory file to remove periodic boundary conditions:
gmx trjconv -pbc mol -center |
When prompted for a centring group and an output group, we choose "System".
Then we generate an index file containing the protein:
gmx make_ndx -f solvated.gro |
Since default groups are enough for this example, when in interactive mode press q and enter to finish.
And finally, to generate α-helix statistics we use
gmx helix |
This will generate a number of plots (in Grace format) and other output files.