Diels-Alder reactions
Along the steps of this example workflow we will show how to:
Generate different conformers of molecules and noncovalent complexes using CREST
Generate the inputs for Gaussian geometry optimizations and frequency calcs (B3LYP/def2TZVP)
Fixing errors and imaginary frequencies of the output LOG files
Generate ORCA inputs for single-point energy corrections (SPC) using DLPNO-CCSD(T)/def2TZVPP
Calculate the Boltzmann weighted thermochemistry using with GoodVibes at 298.15 K
Specifially in this workflow we will calculate the free energy profile for the Diels-Alder reaction for three pairs of reactants shown below:
Reactants 1 |
Reactants 2 |
Reactants 3 |
C1=CC=CC1.C1=CC1 |
C1=CC=CC1.C1=CCC1 |
C1=CC=CC1.C1=CCCC1 |
Note
A jupyter notebook containing all the steps shown in this example can be found in the AQME repository in Github or in Figshare
Step 1: Importing AQME and other python modules
import os, glob, subprocess
import shutil
from pathlib import Path
from aqme.csearch import csearch
from aqme.qprep import qprep
from aqme.qcorr import qcorr
from rdkit import Chem
import pandas as pd
Step 2: Determining distance and angle constraints for TSs
We visualize the first pair of reactants to be able to set up the constraints.
smi1 = 'C1=CC=CC1.C1=CC1'
mol1 = Chem.MolFromSmiles(smi1)
mol1 = Chem.AddHs(mol1)
for i,atom in enumerate(mol1.GetAtoms()):
atom.SetAtomMapNum(i)
smi_new1 = Chem.MolToSmiles(mol1)
print('The new mapped smiles for checking numbers used in constraints is:', smi_new1)
mol1
According to the image we will add the following constraints to the CSV, in the
constraints_dist column we will include [[3,5,2.35],[0,6,2.35]]
Note
For this step we are assuming that the code is being executed in a jupyter notebook. If it is being used through the python console, the following line allows to save the image with the mapping:
from rdkit.Chem import Draw
Draw.MolToFile(mol,'mapping_image.png')
We visualize the second pair of reactants to be able to set up the constraints.
smi1 = 'C1=CC=CC1.C1=CCC1'
mol1 = Chem.MolFromSmiles(smi1)
mol1 = Chem.AddHs(mol1)
for i,atom in enumerate(mol1.GetAtoms()):
atom.SetAtomMapNum(i)
smi_new1 = Chem.MolToSmiles(mol1)
print('The new mapped smiles for checking numbers used in constraints is:', smi_new1)
mol1
According to the image we will add the following constraints to the CSV, in the
constraints_dist column we will include [[3,6,2.35],[0,5,2.35]]
Warning
Although the atoms 5 and 6 are equivalent, we have observed that if we use
the same ordering as in the previous reaction for the constraints the TS
won't be found (i.e. with [[3,5,2.35],[0,6,2.35]]) whereas when we
use the constraints as shown in the example the TS is found.
We visualize the third pair of reactants to be able to set up the constraints.
smi1 = 'C1=CC=CC1.C1=CCCC1'
mol1 = Chem.MolFromSmiles(smi1)
mol1 = Chem.AddHs(mol1)
for i,atom in enumerate(mol1.GetAtoms()):
atom.SetAtomMapNum(i)
smi_new1 = Chem.MolToSmiles(mol1)
print('The new mapped smiles for checking numbers used in constraints is:', smi_new1)
mol1
According to the image we will add the following constraints to the CSV, in the
constraints_dist column we will include [[3,5,2.35],[0,6,2.35]]
Step 3: CSEARCH conformational sampling
With the previous step we can now create a csv file containing all the molecules and noncovalent complexes to calculate, which will have the following contents:
SMILES,code_name,constraints_dist
C1=CC=CC1,Diene,
C1=CC1,Do1,
C1=CCC1,Do2,
C1=CCCC1,Do3,
C1([H:8])=[C:1]([H:9])[C:2]([H:10])=[C:3]([H:11])[C:4]1([H:12])[H:13].[C:5]1([H:14])=[C:6]([H:15])[C:7]1([H:16])[H:17],TS1,"[[3,5,2.35],[0,6,2.35]]"
C1([H:9])=[C:1]([H:10])[C:2]([H:11])=[C:3]([H:12])[C:4]1([H:13])[H:14].[C:5]1([H:15])=[C:6]([H:16])[C:7]([H:17])([H:18])[C:8]1([H:19])[H:20],TS2,"[[3,6,2.35],[0,5,2.35]]"
C1([H:10])=[C:1]([H:11])[C:2]([H:12])=[C:3]([H:13])[C:4]1([H:14])[H:15].[C:5]1([H:16])=[C:6]([H:17])[C:7]([H:18])([H:19])[C:8]([H:20])([H:21])[C:9]1([H:22])[H:23],TS3,"[[3,5,2.35],[0,6,2.35]]"
[C@H]1(C2C=CC3C2)[C@@H]3C1,P1,
[C@H]12[C@@H](C3C=CC2C3)CC1,P2,
[C@H]1(C2C=CC3C2)[C@@H]3CCC1,P3,
Now we can proceed to the conformer generation:
# read the CSV file with SMILES strings and constraints for TSs (from Step 2)
data = pd.read_csv('example2.csv')
csearch(input='example2.csv',
program='crest',
nprocs=12,
cregen=True,
cregen_keywords='--ethr 0.1 --rthr 0.2 --bthr 0.3 --ewin 1')
Step 4: Creating Gaussian input files for optimization and frequency with QPREP
program = 'gaussian'
mem='32GB'
nprocs=16
sdf_TS_files = glob.glob('CSEARCH/TS*crest.sdf')
# COM files for the TSs
qm_input_TS = 'B3LYP/def2tzvp opt=(ts,calcfc,noeigen,maxstep=5) freq=noraman'
qprep(files=sdf_TS_files,
program=program,
qm_input=qm_input_TS,
mem=mem,
nprocs=nprocs)
sdf_INT_files = glob.glob('CSEARCH/D*.sdf') + glob.glob('CSEARCH/P*.sdf')
# COM files for intermediates, reagents and products
qm_input_INT = 'B3LYP/def2tzvp opt freq=noraman'
qprep(files=sdf_INT_files,
program=program,
qm_input=qm_input_INT,
mem=mem,
nprocs=nprocs)
Step 5: Running Gaussian inputs for optimization and frequency calcs externally
Now that we have generated our gaussian input files (in the QCALC location of Step 3) we need to run the gaussian calculations. If you do not know how to run the Gaussian calculations in your HPC please refer to your HPC manager.
As an example, for a single calculation in Gaussian 16 through the terminal we would run the following command on a Linux-based system:
g16 myfile.com
Step 6: QCORR analysis
qcorr(files='QCALC/*.log',
freq_conv='opt=(calcfc,maxstep=5)',
mem=mem,
nprocs=nprocs)
Step 7: Resubmission of unsuccessful calculations (if any) with suggestions from AQME
Now we need to run the generated COM files (in fixed_QM_inputs) with Gaussian like we did in Step 6
After the calculations finish we check again the files using QCORR
new_log_files = "QCALC/failed/run_1/fixed_QM_inputs/*.log"
qcorr(files=new_log_files,
isom_type='com',
isom_inputs='QCALC/failed/run_1/fixed_QM_inputs',
nprocs=16,
mem='32GB')
Step 8: Creating DLPNO input files for ORCA single-point energy calculations
program = 'orca'
mem='16GB'
nprocs=8
qm_files = os.getcwd()+'/QCALC/success/*.log' # LOG files from Steps 6 and 8
destination = os.getcwd()+'/SP' # folder where the ORCA output files are generated
# keyword lines for ORCA inputs
qm_input = r'''
DLPNO-CCSD(T) def2-tzvpp def2-tzvpp/C
%scf maxiter 500
end
% mdci
Density None
end
% elprop
Dipole False
end'''.lstrip()
qprep(destination=destination,
files=qm_files,
program=program,
qm_input=qm_input,
mem=mem,
nprocs=nprocs,
suffix='DLPNO')
Step 9: Running ORCA inputs for single point energy calcs externally
Now we need to run the generated inp files (in sp_path) with ORCA (similarly to how we did in Step 4)
Step 10: Calculating PES with goodvibes
for this step we will need to have a yaml file to use as input for goodvibes. The contents of the yaml file are:
--- # PES
# Double S addition
Reaction1: [Diene+Do1, TS1, P1]
Reaction2: [Diene+Do2, TS2, P2]
Reaction3: [Diene+Do3, TS3, P3]
--- # SPECIES
Diene : Diene*
Do1 : Do1*
TS1 : TS1*
P1 : P1*
Do2 : Do2*
TS2 : TS2*
P2 : P2*
Do3 : Do3*
TS3 : TS3*
P3 : P3*
--- # FORMAT
dec : 1
units: kcal/mol
dpi : 300
color : #1b8bb9,#e5783d,#386e30
With this file we can now generate the profile.
# folder where the OUT files from Step 10 are generated
orca_files = os.getcwd()+'/SP/*.out'
# copy all the Gaussian LOG files and the ORCA OUT files into a new folder
# called GoodVibes_analysis (necessary to apply SPC corrections)
opt_files = glob.glob(qm_files)
spc_files = glob.glob(orca_files)
all_files = opt_files + spc_files
w_dir_main = Path(os.getcwd())
GV_folder = w_dir_main.joinpath('GoodVibes_analysis')
GV_folder.mkdir(exist_ok=True, parents=True)
for file in all_files:
shutil.copy(file, GV_folder)
# run GoodVibes
os.chdir(GV_folder)
command = 'python -m goodvibes --xyz --pes ../pes.yaml --graph ../pes.yaml -c 1 --spc DLPNO *.log'
subprocess.run(command.split())
os.chdir(w_dir_main)






