CSEARCH

class csearch(**kwargs)

Class absracting the geometry generation and conformational search procedure. For further detail on the currently accepted keyword arguments (kwargs) please look at the Parameters section (in the module documentation).

auto_sampling(mol, metal_atoms, metal_idx)

Detects automatically the initial number of conformers for the sampling

compute_confs(job_input, bar, csearch_nprocs)

Function to start conformer generation

conformer_generation(mol, name, constraints_atoms, constraints_dist, constraints_angle, constraints_dihedral, complex_ts, charge, mult, smi, geom, metal_atoms, metal_idx, metal_sym, csearch_nprocs, coord_Map=None, alg_Map=None, mol_template=None, original_atn=None)

Function to load mol objects and create 3D conformers

dihedral_filter_and_sdf(name, csearch_file, coord_Map, alg_Map, mol_template, ff, metal_atoms, metal_idx, metal_sym)

Filtering after dihedral scan to sdf

embed_conf(mol, initial_confs, coord_Map, alg_Map, mol_template, csearch_nprocs, name)

Function to embed conformers

genConformer_r(mol, conf, i, matches, name, sdwriter, update_to_rdkit, coord_Map, alg_Map, mol_template, original_atn, geom, metal_atoms, metal_idx, metal_sym, ff)

If program = RDKit, this replaces iodine back to the metal (if needed) and writes the RDKit SDF files. With program = summ, this function optimizes rotamers

load_jobs(csearch_file)

Load information of the different molecules for conformer generation

min_after_embed(mol, cids, name, csearch_file, rotmatches, update_to_rdkit, coord_Map, alg_Map, mol_template, charge, mult, ff, smi, geom, original_atn, metal_atoms, metal_idx, metal_sym)

Minimizes, gets the energy and filters RDKit conformers after embeding

min_and_E_calc(mol, cids, coord_Map, alg_Map, mol_template, ff, geom, metal_atoms, metal_idx, metal_sym)

Minimization and E calculation with RDKit after embeding

rdkit_to_sdf(mol, name, csearch_file, charge, mult, coord_Map, alg_Map, mol_template, smi, geom, original_atn, metal_atoms, metal_idx, metal_sym, csearch_nprocs)

Conversion from RDKit to SDF

Embeds, optimizes and filters RDKit conformers

Parameters

General

inputstr, default=''

(If smi is None) Optionally, file containing the SMILES strings and names of the molecules. Current file extensions: .smi, .sdf, .cdx, .csv, .com, .gjf, .mol, .mol2, .xyz, .txt, .yaml, .yml, .rtf For .csv files (i.e. FILENAME.csv), two columns are required, 'code_name' with the names and 'SMILES' for the SMILES string

programstr, default=None

Program required in the conformational sampling. Current options: 'rdkit', 'summ', 'fullmonte', 'crest'

smistr, default=None

Optionally, define a SMILES string as input

namestr, default=None

(If smi is defined) optionally, define a name for the system

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the output file(s)

varfilestr, default=None

Option to parse the variables using a yaml file (specify the filename)

chargeint, default=None

Charge of the calculations used in the following input files. If charge isn't defined, it automatically reads the charge of the SMILES string

multint, default=None

Multiplicity of the calculations used in the following input files. If mult isn't defined, it automatically reads the multiplicity of the mol object created with the SMILES string. Be careful with the automated calculation of mult from mol objects when using metals!

prefixstr, default=''

Prefix added to all the names

suffixstr, default=''

Suffix added to all the names

stacksizestr, default='1G'

Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)

General RDKit-based

sampleint, default=25

Number of conformers to keep after the initial RDKit sampling. They are selected using a combination of RDKit energies and Butina clustering

auto_samplestr, default=mid in CSEARCH, low in QDESCP

Apply automatic calculation of the number of conformers generated initially with RDKit. This number of conformers is initially generated and then reduced to the number specified in --sample with different filters. There is a sampling factor, which is multiplied by the number of rotatable bonds, and a maximum number of conformers allowed to pass to the filters. Options: 1. Low: good for descriptor generation in machine learning. Base multiplier = 5, max number of confs = 100 2. Mid: standard, good compromise between number of conformers and computing time. Base multiplier = 10, max number of confs = 250 3. High: demanding method, more conformers and time. Base multiplier = 20, max number of confs = 500 4. False: use the number of conformers specified in --sample

ffstr, default='MMFF'

Force field used in RDKit optimizations and energy calculations. Current options: MMFF and UFF (if MMFF fails, AQME tries to use UFF automatically)

ewin_csearchfloat, default=5.0

Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)

initial_energy_thresholdfloat, default=0.0001

Energy difference in kcal/mol between unique conformers for the first filter of only E

energy_thresholdfloat, default=0.25

Energy difference in kcal/mol between unique conformers for the second filter of E + RMS

rms_thresholdfloat, default=0.25

RMS difference between unique conformers for the second filter of E + RMS

opt_steps_rdkitint, default=1000

Max cycles used in RDKit optimizations

heavyonlybool, default=True

Only consider heavy atoms during RMS calculations for filtering (in the Chem.rdMolAlign.GetBestRMS() RDKit function)

max_matches_rmsdint, default=1000

Max matches during RMS calculations for filtering (maxMatches option in the Chem.rdMolAlign.GetBestRMS() RDKit function)

max_mol_wtint, default=0

Discard systems with molecular weights higher than this parameter (in g/mol). If 0 is set, this filter is off

max_torsionsint, default=0

Discard systems with more than this many torsions (relevant to avoid molecules with many rotatable bonds). If 0 is set, this filter is off

seedint, default=62609

Random seed used during RDKit embedding (in the Chem.rdDistGeom.EmbedMultipleConfs() RDKit function)

geomlist, default=[]

Geometry rule to pass for the systems. Format: [SMARTS,VALUE]. Geometry rules might be atoms, bonds, angles and dihedral. For example, a rule to keep only molecules with C-Pd-C atoms at 180 degrees: ['[C][Pd][C]',180]. Special rules (--geom ['RULE_NAME']):

  1. ['Ir_squareplanar']

bond_thresfloat, default=0.2

Threshold used to discard bonds in the geom option (+-0.2 A)

angle_thresfloat, default=30

Threshold used to discard angles in the geom option (+-30 degrees)

dihedral_thresfloat, default=30

Threshold used to discard dihedral angles in the geom option (+-30 degrees)

Only organometallic molecules

auto_metal_atomsbool, default=True

Automatically detect metal atoms for the RDKit conformer generation. Charge and mult should be specified as well since the automatic charge and mult detection might not be precise.

complex_typestr, default=''

Forces the metal complexes to adopt a predefined geometry. This option is especially relevant when RDKit predicts wrong complex geometries or gives a mixture of geometries. Current options: squareplanar, squarepyramidal, linear, trigonalplanar

SUMM only

degreefloat, default=120.0

Interval of degrees to rotate dihedral angles during SUMM sampling (i.e. 120.0 would create 3 conformers for each dihedral, at 0, 120 and 240 degrees)

Fullmonte only

ewin_fullmontefloat, default=5.0

Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)

ewin_sample_fullmontefloat, default=2.0

Energy window in kcal/mol to use conformers during the Fullmonte sampling (i.e. conformers inside the E window compared to the most stable conformer are considered as unique in each step of the sampling)

nsteps_fullmonteint, default=100

Number of steps (or conformer batches) to carry during the Fullmonte sampling

nrot_fullmonteint, default=3

Number of dihedrals to rotate simultaneously (picked at random) during each step of the Fullmonte sampling

ang_fullmontefloat, default=30

Available angle interval to use in the Fullmonte sampling. For example, if the angle is 120.0, the program chooses randomly between 120 and 240 degrees (picked at random) during each step of the sampling

CREST only

nprocsint, default=8

Number of processors used in CREST optimizations

constraints_atomslist, default=[]

Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]

constraints_distlist of lists, default=[]

Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]

constraints_anglelist of lists, default=[]

Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]

constraints_dihedrallist of lists, default=[]

Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]

crest_forcefloat, default=0.5

Force constant for constraints in the .xcontrol.sample file for CREST jobs

crest_keywordsstr, default=None

Define additional keywords to use in CREST that are not included in --chrg, --uhf, -T and -cinp. For example: '--alpb ch2cl2 --nci --cbonds 0.5'

cregenbool, default=True

If True, perform a CREGEN analysis after CREST (filtering options below)

cregen_keywordsstr, default=None

Additional keywords for CREGEN (i.e. cregen_keywords='--ethr 0.02')

xtb_keywordsstr, default=None

Define additional keywords to use in the xTB pre-optimization that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'

crest_runsint, default=1

Specify as number of runs if multiple starting points from RDKit starting points is required.