Default and Required Parameters

This documents details the default parameters used in the AQME program.

CSEARCH

Parameters

General

inputstr, default=''

(If smi is None) Optionally, file containing the SMILES strings and names of the molecules. Current file extensions: .smi, .sdf, .cdx, .csv, .com, .gjf, .mol, .mol2, .xyz, .txt, .yaml, .yml, .rtf For .csv files (i.e. FILENAME.csv), two columns are required, 'code_name' with the names and 'SMILES' for the SMILES string

programstr, default=None

Program required in the conformational sampling. Current options: 'rdkit', 'summ', 'fullmonte', 'crest'

smistr, default=None

Optionally, define a SMILES string as input

namestr, default=None

(If smi is defined) optionally, define a name for the system

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the output file(s)

varfilestr, default=None

Option to parse the variables using a yaml file (specify the filename)

max_workersint, default=4

Number of simultaneous RDKit jobs run with multiprocessing (WARNING! More than 12 simultaneous jobs might collapse your computer!)

chargeint, default=None

Charge of the calculations used in the following input files. If charge isn't defined, it automatically reads the charge of the SMILES string

multint, default=None

Multiplicity of the calculations used in the following input files. If mult isn't defined, it automatically reads the multiplicity of the mol object created with the SMILES string. Be careful with the automated calculation of mult from mol objects when using metals!

prefixstr, default=''

Prefix added to all the names

suffixstr, default=''

Suffix added to all the names

stacksizestr, default='1G'

Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)

General RDKit-based

sampleint, default='auto'

Number of conformers used initially in the RDKit sampling. If this option isn't specified, AQME automatically calculates (previously benchmarked) an approximate number based on number of rotatable bonds, XH (i.e. OH) groups, saturated cycles, etc (see the auto_sampling() function in csearch.py for more information)

auto_sampleint, default=20

Base multiplicator number used in the sample option

ffstr, default='MMFF'

Force field used in RDKit optimizations and energy calculations. Current options: MMFF and UFF (if MMFF fails, AQME tries to use UFF automatically)

ewin_csearchfloat, default=5.0

Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)

initial_energy_thresholdfloat, default=0.0001

Energy difference in kcal/mol between unique conformers for the first filter of only E

energy_thresholdfloat, default=0.25

Energy difference in kcal/mol between unique conformers for the second filter of E + RMS

rms_thresholdfloat, default=0.25

RMS difference between unique conformers for the second filter of E + RMS

opt_steps_rdkitint, default=1000

Max cycles used in RDKit optimizations

heavyonlybool, default=True

Only consider heavy atoms during RMS calculations for filtering (in the Chem.rdMolAlign.GetBestRMS() RDKit function)

max_matches_rmsdint, default=1000

Max matches during RMS calculations for filtering (maxMatches option in the Chem.rdMolAlign.GetBestRMS() RDKit function)

max_mol_wtint, default=0

Discard systems with molecular weights higher than this parameter (in g/mol). If 0 is set, this filter is off

max_torsionsint, default=0

Discard systems with more than this many torsions (relevant to avoid molecules with many rotatable bonds). If 0 is set, this filter is off

seedint, default=62609

Random seed used during RDKit embedding (in the Chem.rdDistGeom.EmbedMultipleConfs() RDKit function)

geomlist, default=[]

Geometry rule to pass for the systems. Format: [SMARTS,VALUE]. Geometry rules might be atoms, bonds, angles and dihedral. For example, a rule to keep only molecules with C-Pd-C atoms at 180 degrees: ['[C][Pd][C]',180]. Special rules (--geom ['RULE_NAME']):

  1. ['Ir_squareplanar']

bond_thresfloat, default=0.2

Threshold used to discard bonds in the geom option (+-0.2 A)

angle_thresfloat, default=30

Threshold used to discard angles in the geom option (+-30 degrees)

dihedral_thresfloat, default=30

Threshold used to discard dihedral angles in the geom option (+-30 degrees)

Only organometallic molecules
auto_metal_atomsbool, default=True

Automatically detect metal atoms for the RDKit conformer generation. Charge and mult should be specified as well since the automatic charge and mult detection might not be precise.

complex_typestr, default=''

Forces the metal complexes to adopt a predefined geometry. This option is especially relevant when RDKit predicts wrong complex geometries or gives a mixture of geometries. Current options: squareplanar, squarepyramidal, linear, trigonalplanar

SUMM only

degreefloat, default=120.0

Interval of degrees to rotate dihedral angles during SUMM sampling (i.e. 120.0 would create 3 conformers for each dihedral, at 0, 120 and 240 degrees)

Fullmonte only

ewin_fullmontefloat, default=5.0

Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)

ewin_sample_fullmontefloat, default=2.0

Energy window in kcal/mol to use conformers during the Fullmonte sampling (i.e. conformers inside the E window compared to the most stable conformer are considered as unique in each step of the sampling)

nsteps_fullmonteint, default=100

Number of steps (or conformer batches) to carry during the Fullmonte sampling

nrot_fullmonteint, default=3

Number of dihedrals to rotate simultaneously (picked at random) during each step of the Fullmonte sampling

ang_fullmontefloat, default=30

Available angle interval to use in the Fullmonte sampling. For example, if the angle is 120.0, the program chooses randomly between 120 and 240 degrees (picked at random) during each step of the sampling

CREST only

nprocsint, default=2

Number of processors used in CREST optimizations

constraints_atomslist, default=[]

Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]

constraints_distlist of lists, default=[]

Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]

constraints_anglelist of lists, default=[]

Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]

constraints_dihedrallist of lists, default=[]

Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]

crest_forcefloat, default=0.5

Force constant for constraints in the .xcontrol.sample file for CREST jobs

crest_keywordsstr, default=None

Define additional keywords to use in CREST that are not included in --chrg, --uhf, -T and -cinp. For example: '--alpb ch2cl2 --nci --cbonds 0.5'

cregenbool, default=False

If True, perform a CREGEN analysis after CREST (filtering options below)

cregen_keywordsstr, default=None

Additional keywords for CREGEN (i.e. cregen_keywords='--ethr 0.02')

xtb_keywordsstr, default=None

Define additional keywords to use in the xTB pre-optimization that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'

crest_nrunint, default=1

Specify as number of runs if multiple starting points from RDKit starting points is required.

CMIN

Parameters

General

filesstr or list of str, default=None

Input files. Formats accepted: XYZ, SDF, GJF, COM and PDB. Also, lists can be used (i.e. [FILE1.sdf, FILE2.sdf] or *.FORMAT such as *.sdf).

programstr, default=None

Program required in the conformational refining. Current options: 'xtb', 'ani'

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the output file(s)

varfilestr, default=None

Option to parse the variables using a yaml file (specify the filename)

nprocsint, default=2

Number of processors used in the xTB optimizations

chargeint, default=None

Charge of the calculations used in the xTB calculations. If charge isn't defined, it automatically reads the charge from the input SDF files (if the files come from CSEARCH, which adds the property "Real charge") or calculates it from the generated mol object

multint, default=None

Multiplicity of the calculations used in the xTB calculations. If charge isn't defined, it automatically reads the charge from the input SDF files (if the files come from CSEARCH, which adds the property "Mult") or calculates it from the generated mol object. Be careful with the automated calculation of mult from mol objects when using metals!

ewin_cminfloat, default=5.0

Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)

initial_energy_thresholdfloat, default=0.0001

Energy difference in kcal/mol between unique conformers for the first filter of only E

energy_thresholdfloat, default=0.25

Energy difference in kcal/mol between unique conformers for the second filter of E + RMS

rms_thresholdfloat, default=0.25

RMS difference between unique conformers for the second filter of E + RMS

stacksizestr, default='1G'

Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)

prefixstr, default=''

Prefix added to all the names

suffixstr, default=''

Suffix added to all the names

xTB only

xtb_keywordsstr, default=None

Define additional keywords to use in xTB that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'

constraints_atomslist, default=[]

Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]

constraints_distlist of lists, default=[]

Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]

constraints_anglelist of lists, default=[]

Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]

constraints_dihedrallist of lists, default=[]

Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]

ANI only

opt_stepsint, default=1000

Maximum number of steps used in the ase.optimize.BFGS optimizer.

opt_fmaxfloat, default=0.05

Maximum force value to determine convergence in the ase.optimize.BFGS optimizer.

ani_methodstr, default='ANI2x'

ANI model used in the ase.optimize.BFGS optimizer.

QPREP

Parameters

filesmol object, str or list of str, default=None

This module prepares input QM file(s). Formats accepted: mol object(s), Gaussian or ORCA LOG/OUT output files, JSON, XYZ, SDF, PDB. Also, lists can be used (i.e. [FILE1.log, FILE2.log] or *.FORMAT such as *.json).

atom_typeslist of str, default=[]

(If files is None) List containing the atoms of the system

cartesianslist of str, default=[]

(If files is None) Cartesian coordinates used for further processing

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the input file(s)

varfilestr, default=None

Option to parse the variables using a yaml file (specify the filename)

programstr, default=None

Program required to create the new input files. Current options: 'gaussian', 'orca'

qm_inputstr, default=''

Keywords line for new input files (i.e. 'B3LYP/6-31G opt freq')

qm_endstr, default=''

Final line(s) in the new input files

chargeint, default=None

Charge of the calculations used in the following input files. If charge isn't defined, it defaults to 0

multint, default=None

Multiplicity of the calculations used in the following input files. If mult isn't defined, it defaults to 1

suffixstr, default=''

Suffix for the new input files (i.e. FILENAME_SUFFIX.com for FILENAME.log)

prefixstr, default=''

Prefix added to all the names

chkbool, default=False

Include the chk input line in new input files for Gaussian calculations

chk_pathstr, default=''

PATH to store CHK files. For example, if chk_path='root/user, the chk line of the input file would be %chk=root/user/FILENAME.chk

memstr, default='4GB'

Memory for the QM calculations (i) Gaussian: total memory; (ii) ORCA: memory per processor

nprocsint, default=2

Number of processors used in the QM calculations

gen_atomslist of str, default=[]

Atoms included in the gen(ECP) basis set (i.e. ['I','Pd'])

bs_genstr, default=''

Basis set used for gen(ECP) atoms (i.e. 'def2svp')

bs_nogenstr, default=''

Basis set used for non gen(ECP) atoms in gen(ECP) calculations (i.e. '6-31G*')

lowest_onlybool, default=False

Only create input for the conformer with lowest energy of the SDF file

lowest_nint, default=None

Only create inputs for the n conformers with lowest energy of the SDF file

e_threshold_qprepfloat, default=None

Only create inputs for conformers below the energy threshold (to the lowest conformer) of the SDF file

QCORR

Parameters

fileslist of str, default=''

Filenames of QM output files to analyze. If .log (or other strings that are not lists such as *.out) are specified, the program will look for all the log files in the working directory through glob.glob(.log)

w_dir_mainstr, default=os.getcwd()

Working directory

fullcheckbool, default=True

Perform an analysis to detect whether the calculations were done homogeneously (i.e. same level of theory, solvent, grid size, etc)

varfilestr, default=None

Option to parse the variables using a yaml file (specify the filename)

ifreq_cutofffloat, default=0.0

Cut off for to consider whether a frequency is imaginary (absolute of the specified value is used)

amplitude_ifreqfloat, default=0.2

Amplitude used to displace the imaginary frequencies to fix

freq_convstr, default=None

If a string is defined, it will remove calculations that converged during optimization but did not convergence in the subsequent frequency calculation. Options: opt keyword as string (i.e. 'opt=(calcfc,maxstep=5)'). If readfc is specified in the string, the chk option must be included as well.

im_freq_input : str, default='opt=(calcfc,maxstep=5)' (Gaussian), '

%geom Calc_Hess true MaxStep 0.05 end' (ORCA)

When extra imaginery frequencies are detected by QCORR, it automatically adds hessian calcs before starting geometry optimizations. This option can be disabled using im_freq_input=None.

s2_thresholdfloat, default=10.0

Cut off for spin contamination during analysis in % of the expected value (i.e. multiplicity 3 has an the expected <S**2> of 2.0, if s2_threshold = 10, the <S**2> value is allowed to be 2.0 +- 0.2). Set s2_threshold = 0 to deactivate this option.

dup_thresholdfloat, default=0.0001

Energy (in hartree) used as the energy difference in E, H and G to detect duplicates

ro_thresholdfloat, default=0.1

Rotational constant value used as the threshold to detect duplicates

isom_typestr, default=None

Check for isomerization from the initial input file to the resulting output files. It requires the extension of the initial input files (i.e. isom_type='com' or 'gjf') and the folder of the input files must be added in the isom_inputs option

isom_inputsstr, default=os.getcwd()

Folder containing the initial input files to check for isomerization

vdwfracfloat, default=0.50

Fraction of the summed VDW radii that constitutes a bond between two atoms in the isomerization filter

covfracfloat, default=1.10

Fraction of the summed covalent radii that constitutes a bond between two atoms in the isomerization filter

nodup_checkbool, default=False

If True, the duplicate filter is disabled

Note

New input files are generated through the QPREP module and, therefore, all QPREP arguments can be used when calling QCORR and will overwrite default options. For example, if the user specifies qm_input='wb97xd/def2svp', all the new input files generated to fix issues will contain this keywords line. See examples in the 'Example_workflows' folder for more information.

QDESCP

Parameters

General

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the JSON file(s)

programstr, default=None

Program required to create the new descriptors. Current options: 'xtb', 'nmr'

qdescp_atomslist of str, default=[]

Type of atom or group to calculate atomic properties. This option admits atoms (i.e., qdescp_atoms=['P']) and SMART patterns (i.e., qdescp_atoms=['C=O'])

robertbool, default=True

Creates a database ready to use in an AQME-ROBERT machine learning workflow, combining the input CSV with SMILES/code_name and the calculated xTB/DBSTEP descriptors

xTB descriptors

fileslist of str, default=''

Filenames of SDF/PDB/XYZ files to calculate xTB descriptors. If *.sdf (or other strings that are not lists such as *.pdb) are specified, the program will look for all the SDF files in the working directory through glob.glob(*.sdf)

chargeint, default=None

Charge of the calculations used in the following input files (charges from SDF files generated in CSEARCH are read automatically).

multint, default=None

Multiplicity of the calculations used in the following input files (multiplicities from SDF files generated in CSEARCH are read automatically).

qdescp_solventstr, default=None

Solvent used in the xTB property calculations (ALPB model)

qdescp_tempfloat, default=300

Temperature required for the xTB property calculations

qdescp_accfloat, default=0.2

Accuracy required for the xTB property calculations

qdescp_optstr, default='normal'

Convergence criteria required for the xTB property calculations

boltzbool, default=True

Calculation of Boltzmann averaged xTB properties and addition of RDKit molecular descriptors

xtb_optbool, default=True

Performs an initial xTB geometry optimization before calculating descriptors

DBSTEP descriptors

dbstep_calcbool, default=False

Whether to add a DBSTEP calculation of buried volume when generating atomic descriptors with qdescp_atoms. To activiate it, add --dbstep_calc to the command line

dbstep_rfloat, default=3.5

Radius used in the DBSTEP calculations (in A)

NMR simulation

fileslist of str, default=''

Filenames of LOG files to retrieve NMR shifts from Gaussian calculations (*.log can be used to include all the log files in the working directory)

boltzbool, default=True

Calculation of Boltzmann averaged NMR shifts

nmr_atomslist of str, default=[6, 1]

List containing the atom types (as atomic numbers) to consider. For example, if the user wants to retrieve NMR shifts from C and H atoms nmr_atoms=[6, 1]

nmr_slopelist of float, default=[-1.0537, -1.0784]

List containing the slope to apply for the raw NMR shifts calculated with Gaussian. A slope needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_slope=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_interceptlist of float, default=[181.7815, 31.8723]

List containing the intercept to apply for the raw NMR shifts calculated with Gaussian. An intercept needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_intercept=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_experimstr, default=None

Filename of a CSV containing the experimental NMR shifts. Two columnds are needed: A) 'atom_idx' should contain the indexes of the atoms to study as seen in GaussView or other molecular visualizers (i.e., the first atom of the coordinates has index 1); B) 'experimental_ppm' should contain the experimental NMR shifts in ppm observed for the atoms.

VISMOL

Parameters

fileslist of str, default=''

Filenames of SDF/PDB/XYZ to visualize conformers. If .sdf (or other strings that are not lists such as *.pdb) are specified, the program will look for all the SDF files in the working directory through glob.glob(.sdf). Internal options of "line", "stick", "sphere" incorporated. Code reference from: [https://iwatobipen.wordpress.com]