Default and Required Parameters
This documents details the default parameters used in the AQME program.
CSEARCH
Parameters
General
- inputstr, default=''
(If smi is None) Optionally, file containing the SMILES strings and names of the molecules. Current file extensions: .smi, .sdf, .cdx, .csv, .com, .gjf, .mol, .mol2, .xyz, .txt, .yaml, .yml, .rtf For .csv files (i.e. FILENAME.csv), two columns are required, 'code_name' with the names and 'SMILES' for the SMILES string
- programstr, default=None
Program required in the conformational sampling. Current options: 'rdkit', 'summ', 'fullmonte', 'crest'
- smistr, default=None
Optionally, define a SMILES string as input
- namestr, default=None
(If smi is defined) optionally, define a name for the system
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the output file(s)
- varfilestr, default=None
Option to parse the variables using a yaml file (specify the filename)
- chargeint, default=None
Charge of the calculations used in the following input files. If charge isn't defined, it automatically reads the charge of the SMILES string
- multint, default=None
Multiplicity of the calculations used in the following input files. If mult isn't defined, it automatically reads the multiplicity of the mol object created with the SMILES string. Be careful with the automated calculation of mult from mol objects when using metals!
- prefixstr, default=''
Prefix added to all the names
- suffixstr, default=''
Suffix added to all the names
- stacksizestr, default='1G'
Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)
General RDKit-based
- sampleint, default='auto'
Number of conformers used initially in the RDKit sampling. If this option isn't specified, AQME automatically calculates (previously benchmarked) an approximate number based on number of rotatable bonds, XH (i.e. OH) groups, saturated cycles, etc (see the auto_sampling() function in csearch.py for more information)
- auto_sampleint, default=20
Base multiplicator number used in the sample option
- ffstr, default='MMFF'
Force field used in RDKit optimizations and energy calculations. Current options: MMFF and UFF (if MMFF fails, AQME tries to use UFF automatically)
- ewin_csearchfloat, default=5.0
Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)
- initial_energy_thresholdfloat, default=0.0001
Energy difference in kcal/mol between unique conformers for the first filter of only E
- energy_thresholdfloat, default=0.25
Energy difference in kcal/mol between unique conformers for the second filter of E + RMS
- rms_thresholdfloat, default=0.25
RMS difference between unique conformers for the second filter of E + RMS
- opt_steps_rdkitint, default=1000
Max cycles used in RDKit optimizations
- heavyonlybool, default=True
Only consider heavy atoms during RMS calculations for filtering (in the Chem.rdMolAlign.GetBestRMS() RDKit function)
- max_matches_rmsdint, default=1000
Max matches during RMS calculations for filtering (maxMatches option in the Chem.rdMolAlign.GetBestRMS() RDKit function)
- max_mol_wtint, default=0
Discard systems with molecular weights higher than this parameter (in g/mol). If 0 is set, this filter is off
- max_torsionsint, default=0
Discard systems with more than this many torsions (relevant to avoid molecules with many rotatable bonds). If 0 is set, this filter is off
- seedint, default=62609
Random seed used during RDKit embedding (in the Chem.rdDistGeom.EmbedMultipleConfs() RDKit function)
- geomlist, default=[]
Geometry rule to pass for the systems. Format: [SMARTS,VALUE]. Geometry rules might be atoms, bonds, angles and dihedral. For example, a rule to keep only molecules with C-Pd-C atoms at 180 degrees: ['[C][Pd][C]',180]. Special rules (--geom ['RULE_NAME']):
['Ir_squareplanar']
- bond_thresfloat, default=0.2
Threshold used to discard bonds in the geom option (+-0.2 A)
- angle_thresfloat, default=30
Threshold used to discard angles in the geom option (+-30 degrees)
- dihedral_thresfloat, default=30
Threshold used to discard dihedral angles in the geom option (+-30 degrees)
Only organometallic molecules
- auto_metal_atomsbool, default=True
Automatically detect metal atoms for the RDKit conformer generation. Charge and mult should be specified as well since the automatic charge and mult detection might not be precise.
- complex_typestr, default=''
Forces the metal complexes to adopt a predefined geometry. This option is especially relevant when RDKit predicts wrong complex geometries or gives a mixture of geometries. Current options: squareplanar, squarepyramidal, linear, trigonalplanar
SUMM only
- degreefloat, default=120.0
Interval of degrees to rotate dihedral angles during SUMM sampling (i.e. 120.0 would create 3 conformers for each dihedral, at 0, 120 and 240 degrees)
Fullmonte only
- ewin_fullmontefloat, default=5.0
Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)
- ewin_sample_fullmontefloat, default=2.0
Energy window in kcal/mol to use conformers during the Fullmonte sampling (i.e. conformers inside the E window compared to the most stable conformer are considered as unique in each step of the sampling)
- nsteps_fullmonteint, default=100
Number of steps (or conformer batches) to carry during the Fullmonte sampling
- nrot_fullmonteint, default=3
Number of dihedrals to rotate simultaneously (picked at random) during each step of the Fullmonte sampling
- ang_fullmontefloat, default=30
Available angle interval to use in the Fullmonte sampling. For example, if the angle is 120.0, the program chooses randomly between 120 and 240 degrees (picked at random) during each step of the sampling
CREST only
- nprocsint, default=8
Number of processors used in CREST optimizations
- constraints_atomslist, default=[]
Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]
- constraints_distlist of lists, default=[]
Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]
- constraints_anglelist of lists, default=[]
Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]
- constraints_dihedrallist of lists, default=[]
Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]
- crest_forcefloat, default=0.5
Force constant for constraints in the .xcontrol.sample file for CREST jobs
- crest_keywordsstr, default=None
Define additional keywords to use in CREST that are not included in --chrg, --uhf, -T and -cinp. For example: '--alpb ch2cl2 --nci --cbonds 0.5'
- cregenbool, default=False
If True, perform a CREGEN analysis after CREST (filtering options below)
- cregen_keywordsstr, default=None
Additional keywords for CREGEN (i.e. cregen_keywords='--ethr 0.02')
- xtb_keywordsstr, default=None
Define additional keywords to use in the xTB pre-optimization that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'
- crest_nrunint, default=1
Specify as number of runs if multiple starting points from RDKit starting points is required.
CMIN
Parameters
General
- filesstr or list of str, default=None
Input files. Formats accepted: XYZ, SDF, GJF, COM and PDB. Also, lists can be used (i.e. [FILE1.sdf, FILE2.sdf] or *.FORMAT such as *.sdf).
- programstr, default=None
Program required in the conformational refining. Current options: 'xtb', 'ani'
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the output file(s)
- varfilestr, default=None
Option to parse the variables using a yaml file (specify the filename)
- nprocsint, default=2
Number of processors used in the xTB optimizations
- chargeint, default=None
Charge of the calculations used in the xTB calculations. If charge isn't defined, it automatically reads the charge from the input SDF files (if the files come from CSEARCH, which adds the property "Real charge") or calculates it from the generated mol object
- multint, default=None
Multiplicity of the calculations used in the xTB calculations. If charge isn't defined, it automatically reads the charge from the input SDF files (if the files come from CSEARCH, which adds the property "Mult") or calculates it from the generated mol object. Be careful with the automated calculation of mult from mol objects when using metals!
- ewin_cminfloat, default=5.0
Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)
- initial_energy_thresholdfloat, default=0.0001
Energy difference in kcal/mol between unique conformers for the first filter of only E
- energy_thresholdfloat, default=0.25
Energy difference in kcal/mol between unique conformers for the second filter of E + RMS
- rms_thresholdfloat, default=0.25
RMS difference between unique conformers for the second filter of E + RMS
- stacksizestr, default='1G'
Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)
- prefixstr, default=''
Prefix added to all the names
- suffixstr, default=''
Suffix added to all the names
xTB only
- xtb_keywordsstr, default=None
Define additional keywords to use in xTB that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'
- constraints_atomslist, default=[]
Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]
- constraints_distlist of lists, default=[]
Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]
- constraints_anglelist of lists, default=[]
Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]
- constraints_dihedrallist of lists, default=[]
Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]
ANI only
- opt_stepsint, default=1000
Maximum number of steps used in the ase.optimize.BFGS optimizer.
- opt_fmaxfloat, default=0.05
Maximum force value to determine convergence in the ase.optimize.BFGS optimizer.
- ani_methodstr, default='ANI2x'
ANI model used in the ase.optimize.BFGS optimizer.
QPREP
Parameters
- filesmol object, str or list of str, default=None
This module prepares input QM file(s). Formats accepted: mol object(s), Gaussian or ORCA LOG/OUT output files, JSON, XYZ, SDF, PDB. Also, lists can be used (i.e. [FILE1.log, FILE2.log] or *.FORMAT such as *.json).
- atom_typeslist of str, default=[]
(If files is None) List containing the atoms of the system
- cartesianslist of str, default=[]
(If files is None) Cartesian coordinates used for further processing
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the input file(s)
- varfilestr, default=None
Option to parse the variables using a yaml file (specify the filename)
- programstr, default=None
Program required to create the new input files. Current options: 'gaussian', 'orca'
- qm_inputstr, default=''
Keywords line for new input files (i.e. 'B3LYP/6-31G opt freq')
- qm_endstr, default=''
Final line(s) in the new input files
- chargeint, default=None
Charge of the calculations used in the following input files. If charge isn't defined, it defaults to 0
- multint, default=None
Multiplicity of the calculations used in the following input files. If mult isn't defined, it defaults to 1
- suffixstr, default=''
Suffix for the new input files (i.e. FILENAME_SUFFIX.com for FILENAME.log)
- prefixstr, default=''
Prefix added to all the names
- chkbool, default=False
Include the chk input line in new input files for Gaussian calculations
- oldchkbool, default=False
Include the oldchk input line in new input files for Gaussian calculations
- chk_pathstr, default=''
PATH to store CHK files. For example, if chk_path='root/user/FILENAME.chk, the chk line of the input file would be %chk=root/user/FILENAME.chk
- oldchk_pathstr, default=''
PATH to read CHK files with %oldchk. For example, if oldchk_path='root/user/FILENAME.chk, the oldchk line of the input file would be %oldchk=root/user/FILENAME.chk
- memstr, default='4GB'
Memory for the QM calculations (i) Gaussian: total memory; (ii) ORCA: memory per processor
- nprocsint, default=2
Number of processors used in the QM calculations
- gen_atomslist of str, default=[]
Atoms included in the gen(ECP) basis set (i.e. ['I','Pd'])
- bs_genstr, default=''
Basis set used for gen(ECP) atoms (i.e. 'def2svp')
- bs_nogenstr, default=''
Basis set used for non gen(ECP) atoms in gen(ECP) calculations (i.e. '6-31G*')
- lowest_onlybool, default=False
Only create input for the conformer with lowest energy of the SDF file
- lowest_nint, default=None
Only create inputs for the n conformers with lowest energy of the SDF file
- e_threshold_qprepfloat, default=None
Only create inputs for conformers below the energy threshold (to the lowest conformer) of the SDF file
QCORR
Parameters
- fileslist of str, default=''
Filenames of QM output files to analyze. If .log (or other strings that are not lists such as *.out) are specified, the program will look for all the log files in the working directory through glob.glob(.log)
- w_dir_mainstr, default=os.getcwd()
Working directory
- fullcheckbool, default=True
Perform an analysis to detect whether the calculations were done homogeneously (i.e. same level of theory, solvent, grid size, etc)
- varfilestr, default=None
Option to parse the variables using a yaml file (specify the filename)
- ifreq_cutofffloat, default=0.0
Cut off for to consider whether a frequency is imaginary (absolute of the specified value is used)
- amplitude_ifreqfloat, default=0.2
Amplitude used to displace the imaginary frequencies to fix
- freq_convstr, default=None
If a string is defined, it will remove calculations that converged during optimization but did not convergence in the subsequent frequency calculation. Options: opt keyword as string (i.e. 'opt=(calcfc,maxstep=5)'). If readfc is specified in the string, the chk option must be included as well.
- im_freq_inputstr, default='opt=(calcfc,maxstep=5)' (Gaussian), 'n%geomnCalc_Hess truenMaxStep 0.05nend' (ORCA)
When extra imaginery frequencies are detected by QCORR, it automatically adds hessian calcs before starting geometry optimizations. This option can be disabled using im_freq_input=None.
- s2_thresholdfloat, default=10.0
Cut off for spin contamination during analysis in % of the expected value (i.e. multiplicity 3 has an the expected <S**2> of 2.0, if s2_threshold = 10, the <S**2> value is allowed to be 2.0 +- 0.2). Set s2_threshold = 0 to deactivate this option.
- dup_thresholdfloat, default=0.0001
Energy (in hartree) used as the energy difference in E, H and G to detect duplicates
- ro_thresholdfloat, default=0.1
Rotational constant value used as the threshold to detect duplicates
- isom_typestr, default=None
Check for isomerization from the initial input file to the resulting output files. It requires the extension of the initial input files (i.e. isom_type='com' or 'gjf') and the folder of the input files must be added in the isom_inputs option
- isom_inputsstr, default=os.getcwd()
Folder containing the initial input files to check for isomerization
- vdwfracfloat, default=0.50
Fraction of the summed VDW radii that constitutes a bond between two atoms in the isomerization filter
- covfracfloat, default=1.10
Fraction of the summed covalent radii that constitutes a bond between two atoms in the isomerization filter
- nodup_checkbool, default=False
If True, the duplicate filter is disabled
Note
New input files are generated through the QPREP module and, therefore, all QPREP arguments can be used when calling QCORR and will overwrite default options. For example, if the user specifies qm_input='wb97xd/def2svp', all the new input files generated to fix issues will contain this keywords line. See examples in the 'Example_workflows' folder for more information.
QDESCP
Parameters
General
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the JSON file(s)
- programstr, default=None
Program required to create the new descriptors. Current options: 'xtb', 'nmr'
- qdescp_atomslist of str, default=[]
Type of atom or group to calculate atomic properties. This option admits atoms (i.e., qdescp_atoms=['P']) and SMART patterns (i.e., qdescp_atoms=['C=O'])
- robertbool, default=True
Creates a database ready to use in an AQME-ROBERT machine learning workflow, combining the input CSV with SMILES/code_name and the calculated xTB/DBSTEP descriptors
xTB descriptors
- fileslist of str, default=''
Filenames of SDF/PDB/XYZ files to calculate xTB descriptors. If *.sdf (or other strings that are not lists such as *.pdb) are specified, the program will look for all the SDF files in the working directory through glob.glob(*.sdf)
- chargeint, default=None
Charge of the calculations used in the following input files (charges from SDF files generated in CSEARCH are read automatically).
- multint, default=None
Multiplicity of the calculations used in the following input files (multiplicities from SDF files generated in CSEARCH are read automatically).
- qdescp_solventstr, default=None
Solvent used in the xTB property calculations (ALPB model)
- qdescp_tempfloat, default=300
Temperature required for the xTB property calculations
- qdescp_accfloat, default=0.2
Accuracy required for the xTB property calculations
- qdescp_optstr, default='normal'
Convergence criteria required for the xTB property calculations
- boltzbool, default=True
Calculation of Boltzmann averaged xTB properties and addition of RDKit molecular descriptors
- xtb_optbool, default=True
Performs an initial xTB geometry optimization before calculating descriptors
DBSTEP descriptors
- dbstep_calcbool, default=False
Whether to add a DBSTEP calculation of buried volume when generating atomic descriptors with qdescp_atoms. To activiate it, add --dbstep_calc to the command line
- dbstep_rfloat, default=3.5
Radius used in the DBSTEP calculations (in A)
NMR simulation
- fileslist of str, default=''
Filenames of LOG files to retrieve NMR shifts from Gaussian calculations (*.log can be used to include all the log files in the working directory)
- boltzbool, default=True
Calculation of Boltzmann averaged NMR shifts
- nmr_atomslist of str, default=[6, 1]
List containing the atom types (as atomic numbers) to consider. For example, if the user wants to retrieve NMR shifts from C and H atoms nmr_atoms=[6, 1]
- nmr_slopelist of float, default=[-1.0537, -1.0784]
List containing the slope to apply for the raw NMR shifts calculated with Gaussian. A slope needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_slope=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.
- nmr_interceptlist of float, default=[181.7815, 31.8723]
List containing the intercept to apply for the raw NMR shifts calculated with Gaussian. An intercept needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_intercept=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.
- nmr_experimstr, default=None
Filename of a CSV containing the experimental NMR shifts. Two columnds are needed: A) 'atom_idx' should contain the indexes of the atoms to study as seen in GaussView or other molecular visualizers (i.e., the first atom of the coordinates has index 1); B) 'experimental_ppm' should contain the experimental NMR shifts in ppm observed for the atoms.