CSEARCH
- class csearch(**kwargs)
Handles geometry generation and conformational search.
This class provides functionality for: 1. Multiple conformer generation using RDKit or CREST 2. Structure optimization and refinement 3. Conformer filtering and clustering 4. Support for various input formats (SMILES, SDF, MOL2, etc.) 5. Parallel processing capabilities
For detailed parameter documentation, see module documentation.
- auto_sampling(mol, metal_atoms, metal_idx)
Automatically determine number of conformers for sampling.
This method calculates the appropriate number of conformers based on: 1. Molecule complexity (rotatable bonds, rings, etc.) 2. Presence of metal atoms and their coordination 3. User-specified sampling level (low/mid/high)
- Args:
mol: RDKit molecule object metal_atoms (list): Metal atoms present metal_idx (list): Indices of metal atoms
- Returns:
int: Number of conformers to generate
- Raises:
SystemExit: If invalid sampling level specified
- compute_confs(job_input, progress_bar, nprocs)
Generate conformers for a single molecule.
This method: 1. Processes input molecule data 2. Converts to RDKit molecule if needed 3. Handles 3D input formats 4. Sets up conformer generation
- Args:
job_input (tuple): Job configuration parameters progress_bar (IncrementalBar): Progress tracking nprocs (int): Number of processors to use
- Returns:
None: Updates are made to the filesystem
- conformer_generation(mol, name, constraints_atoms, constraints_dist, constraints_angle, constraints_dihedral, complex_ts, charge, mult, smi, geom, metal_atoms, metal_idx, metal_sym, csearch_nprocs, sample, coord_Map=None, alg_Map=None, mol_template=None, original_atn=None)
Generate 3D conformers for a molecule.
This method handles conformer generation using either CREST or RDKit depending on the input type and program selection.
- Args:
mol: RDKit molecule object name (str): Molecule identifier constraints_atoms (list): Constrained atom indices constraints_dist (list): Distance constraints constraints_angle (list): Angle constraints constraints_dihedral (list): Dihedral angle constraints complex_ts (bool): Whether molecule is a complex/transition state charge (int): Molecular charge mult (int): Multiplicity smi (str): SMILES string or 3D structure geom (list): Geometry constraints metal_atoms (list): Metal atoms in molecule metal_idx (list): Metal atom indices metal_sym (list): Metal atom symbols csearch_nprocs (int): Number of processors sample (int): Number of conformers to generate coord_Map: Coordinate mapping (optional) alg_Map: Alignment mapping (optional) mol_template: Template molecule (optional) original_atn: Original atomic numbers (optional)
- Returns:
None: Results written to files
- embed_conf(mol, initial_confs, coord_Map, alg_Map, mol_template, csearch_nprocs, name)
Embed multiple conformers using RDKit's distance geometry.
This method: 1. Handles 3D input structures appropriately 2. Sets up conformer embedding parameters 3. Attempts fallback options if initial embedding fails
- Args:
mol: RDKit molecule object initial_confs (int): Number of conformers to generate coord_Map: Coordinate mapping for template alignment alg_Map: Atom mapping for template alignment mol_template: Template molecule for alignment csearch_nprocs (int): Number of processors to use name (str): Molecule name for logging
- Returns:
list: Generated conformer IDs
- find_metal_atom(mol, charge, mult, name)
Detect transition metal atoms in molecule.
This method scans through all atoms in the molecule to identify transition metals and warns about potential charge/multiplicity issues for metal complexes.
- Args:
mol: RDKit molecule object to analyze charge (int): Molecular charge (can be None) mult (int): Multiplicity (can be None) name (str): Molecule name for logging
- Returns:
list: Symbols of detected metal atoms
- Note:
Issues warnings if charge/multiplicity are not explicitly specified for metal-containing molecules
- genConformer_r(mol, conf, sdwriter, update_to_rdkit, coord_Map, alg_Map, mol_template, original_atn, metal_atoms, metal_idx, metal_sym, ff)
Process and write conformer to SDF file.
This method handles: 1. Metal atom restoration from iodine placeholders 2. Energy minimization for metal-containing molecules 3. Conformer writing to SDF file
- Args:
mol: RDKit molecule object conf (int): Conformer ID sdwriter: RDKit SDWriter object update_to_rdkit (bool): Whether to update coordinates coord_Map: Coordinate mapping alg_Map: Alignment mapping mol_template: Template molecule original_atn: Original atomic numbers geom (list): Geometry constraints metal_atoms (list): Metal atoms present metal_idx (list): Metal atom indices metal_sym (list): Metal atom symbols ff (str): Force field to use
- Returns:
int: Status code (1 for success)
- load_jobs(csearch_file)
Load molecular information for conformer generation.
This method: 1. Validates input file format 2. Maps file extension to appropriate handler 3. Prepares job inputs for conformer generation
- Args:
csearch_file (str or Path): Path to input file
- Returns:
list: List of job inputs for conformer generation
- Raises:
SystemExit: If file format is unsupported or file is not found
- min_after_embed(mol, cids, name, csearch_file, update_to_rdkit, coord_Map, alg_Map, mol_template, charge, mult, ff, smi, geom, original_atn, metal_atoms, metal_idx, metal_sym, sample)
Process embedded conformers including minimization and filtering.
This method: 1. Minimizes and filters conformers 2. Applies geometry constraints 3. Sorts by energy 4. Writes selected conformers to file 5. Optionally clusters similar conformers
- Args:
mol: RDKit molecule object cids (list): Conformer IDs name (str): Molecule name csearch_file (Path): Output file path update_to_rdkit (bool): Update coordinates flag coord_Map: Coordinate mapping alg_Map: Alignment mapping mol_template: Template molecule charge (int): Molecular charge mult (int): Multiplicity ff (str): Force field smi (str): SMILES string geom (list): Geometry constraints original_atn: Original atomic numbers metal_atoms (list): Metal atoms metal_idx (list): Metal indices metal_sym (list): Metal symbols sample (int): Number of conformers to keep
- Returns:
tuple: (status, output molecules)
- min_and_E_calc(mol, cids, coord_Map, alg_Map, mol_template, ff, geom, metal_atoms, metal_idx, metal_sym)
Energy minimization and geometry filtering of conformers.
This method: 1. Minimizes each conformer using RDKit force fields 2. Applies geometry filters and constraints 3. Collects passing conformers and energies
- Args:
mol: RDKit molecule object cids (list): Conformer IDs to process coord_Map: Coordinate mapping alg_Map: Alignment mapping mol_template: Template molecule ff (str): Force field to use geom (list): Geometry constraints metal_atoms (list): Metal atoms metal_idx (list): Metal atom indices metal_sym (list): Metal atom symbols
- Returns:
- tuple: (
outmols: List of passing molecule objects, passing_cids: List of passing conformer IDs, cenergy: List of conformer energies
)
- rdkit_search(mol, name, csearch_file, charge, mult, constraints_atoms, constraints_dist, constraints_angle, constraints_dihedral, complex_ts, coord_Map, alg_Map, mol_template, smi, geom, original_atn, metal_atoms, metal_idx, metal_sym, csearch_nprocs, sample)
Generate and optimize conformers using RDKit and optionally CREST.
This method handles: 1. Initial RDKit conformer generation 2. Optional CREST optimization 3. Conformer filtering and clustering
- Args:
mol: RDKit molecule object name (str): Molecule name csearch_file (Path): Output file path charge (int): Molecular charge mult (int): Multiplicity constraints_*: Various constraint parameters complex_ts (bool): Whether molecule is complex/TS coord_Map: Coordinate mapping alg_Map: Alignment mapping mol_template: Template molecule smi (str): SMILES string geom (list): Geometry constraints original_atn: Original atomic numbers metal_*: Metal atom information csearch_nprocs (int): Number of processors sample (int): Number of conformers
- Returns:
int: Status code (0 for success, -1 for failure)
- rdkit_to_sdf(mol, name, csearch_file, charge, mult, coord_Map, alg_Map, mol_template, smi, geom, original_atn, metal_atoms, metal_idx, metal_sym, csearch_nprocs, sample)
Conversion from RDKit to SDF
- run_csearch(job_inputs)
Run conformer search on all input jobs.
This method handles parallel processing of conformer generation jobs: 1. Sets up progress tracking 2. Executes jobs in parallel if possible 3. Handles job exceptions gracefully
- Args:
job_inputs (list): List of job configurations
Parameters
General
- inputstr, default=''
(If smi is None) Optionally, file containing the SMILES strings and names of the molecules. Current file extensions: .smi, .sdf, .cdx, .csv, .com, .gjf, .mol, .mol2, .xyz, .txt, .yaml, .yml, .rtf For .csv files (i.e. FILENAME.csv), two columns are required, 'code_name' with the names and 'SMILES' for the SMILES string
- programstr, default=None
Program required in the conformational sampling. Current options: 'rdkit', 'crest'
- smistr, default=None
Optionally, define a SMILES string as input
- namestr, default=None
(If smi is defined) optionally, define a name for the system
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the output file(s)
- varfilestr, default=None
Option to parse the variables using a yaml file (specify the filename)
- chargeint, default=None
Charge of the calculations used in the following input files. If charge isn't defined, it automatically reads the charge of the SMILES string
- multint, default=None
Multiplicity of the calculations used in the following input files. If mult isn't defined, it automatically reads the multiplicity of the mol object created with the SMILES string. Be careful with the automated calculation of mult from mol objects when using metals!
- prefixstr, default=''
Prefix added to all the names
- suffixstr, default=''
Suffix added to all the names
- stacksizestr, default='1G'
Controls the stack size used (especially relevant for xTB/CREST calculations of large systems, where high stack sizes are needed)
General RDKit-based
- sampleint, default=25
Number of conformers to keep after the initial RDKit sampling. They are selected using a combination of RDKit energies and Butina clustering
- auto_samplestr, default=mid in CSEARCH, low in QDESCP
Apply automatic calculation of the number of conformers generated initially with RDKit. This number of conformers is initially generated and then reduced to the number specified in --sample with different filters. There is a sampling factor, which is multiplied by the number of rotatable bonds, and a maximum number of conformers allowed to pass to the filters. Options: 1. Low: good for descriptor generation in machine learning. Base multiplier = 5, max number of confs = 100 2. Mid: standard, good compromise between number of conformers and computing time. Base multiplier = 10, max number of confs = 250 3. High: demanding method, more conformers and time. Base multiplier = 20, max number of confs = 500 4. False: use the number of conformers specified in --sample
- ffstr, default='MMFF'
Force field used in RDKit optimizations and energy calculations. Current options: MMFF, UFF (if MMFF fails, AQME tries to use UFF automatically), and NO FF (works well with metals when UFF doesn't work)
- ewin_csearchfloat, default=5.0
Energy window in kcal/mol to discard conformers (i.e. if a conformer is more than the E window compared to the most stable conformer)
- initial_energy_thresholdfloat, default=0.0001
Energy difference in kcal/mol between unique conformers for the first filter of only E
- energy_thresholdfloat, default=0.25
Energy difference in kcal/mol between unique conformers for the second filter of E + RMS
- rms_thresholdfloat, default=0.25
RMS difference between unique conformers for the second filter of E + RMS
- opt_steps_rdkitint, default=1000
Max cycles used in RDKit optimizations
- heavyonlybool, default=True
Only consider heavy atoms during RMS calculations for filtering (in the Chem.rdMolAlign.GetBestRMS() RDKit function)
- max_matches_rmsdint, default=1000
Max matches during RMS calculations for filtering (maxMatches option in the Chem.rdMolAlign.GetBestRMS() RDKit function)
- max_mol_wtint, default=0
Discard systems with molecular weights higher than this parameter (in g/mol). If 0 is set, this filter is off
- max_torsionsint, default=0
Discard systems with more than this many torsions (relevant to avoid molecules with many rotatable bonds). If 0 is set, this filter is off
- seedint, default=62609
Random seed used during RDKit embedding (in the Chem.rdDistGeom.EmbedMultipleConfs() RDKit function)
- geomlist, default=[]
Geometry rule to pass for the systems. Format: [SMARTS,VALUE]. Geometry rules might be atoms, bonds, angles and dihedral. For example, a rule to keep only molecules with C-Pd-C atoms at 180 degrees: ['[C][Pd][C]',180]. Multiple rules can be used at the same time (['C[Pd]C',180,'C[Pd]N',90]). Special rules (--geom ['RULE_NAME']):
['Ir_squareplanar']
- bond_thresfloat, default=0.2
Threshold used to discard bonds in the geom option (+-0.2 A)
- angle_thresfloat, default=30
Threshold used to discard angles in the geom option (+-30 degrees)
- dihedral_thresfloat, default=30
Threshold used to discard dihedral angles in the geom option (+-30 degrees)
Only organometallic molecules
- auto_metal_atomsbool, default=True
Automatically detect metal atoms for the RDKit conformer generation. Charge and mult should be specified as well since the automatic charge and mult detection might not be precise.
- complex_typestr, default=''
Forces the metal complexes to adopt a predefined geometry. This option is especially relevant when RDKit predicts wrong complex geometries or gives a mixture of geometries. Current options: squareplanar, squarepyramidal, linear, trigonalplanar
- single_systembool, default=False
When using complex_type templates in CSEARCH, keep only one system of all the options. This option is useful to avoid repetition when the complex has two identical ligands (i.e. two Cl substituents).
CREST only
- nprocsint, default=8
Number of processors used in CREST optimizations
- constraints_atomslist, default=[]
Specify constrained atoms as [AT1,AT2,AT3]. An example of multiple constraints with atoms 1, 2 and 5 frozen: [1,2,5]
- constraints_distlist of lists, default=[]
Specify distance constraints as [AT1,AT2,DIST]. An example of multiple constraints with atoms 1 and 2 frozen at a distance of 1.8 Å, and atoms 4 and 5 with distance of 2.0 Å: [[1,2,1.8],[4,5,2.0]]
- constraints_anglelist of lists, default=[]
Specify angle constraints as [AT1,AT2,AT3,ANGLE]. An example of multiple constraints with atoms 1, 2 and 3 frozen at an angle of 180 degrees, and atoms 4, 5 and 6 with an angle of 120: [[1,2,3,180],[4,5,6,120]]
- constraints_dihedrallist of lists, default=[]
Specify dihedral constraints as [AT1,AT2,AT3,AT4,DIHEDRAL]. An example of multiple constraints with atoms 1, 2, 3 and 4 frozen at a dihedral angle of 180 degrees, and atoms 4, 5, 6 and 7 with a dihedral angle of 120: [[1,2,3,4,180],[4,5,6,7,120]]
- crest_forcefloat, default=0.5
Force constant for constraints in the .xcontrol.sample file for CREST jobs
- crest_keywordsstr, default=None
Define additional keywords to use in CREST that are not included in --chrg, --uhf, -T and -cinp. For example: '--alpb ch2cl2 --nci --cbonds 0.5'
- cregenbool, default=True
If True, perform a CREGEN analysis after CREST (filtering options below)
- cregen_keywordsstr, default=None
Additional keywords for CREGEN (i.e. cregen_keywords='--ethr 0.02')
- xtb_keywordsstr, default=None
Define additional keywords to use in the xTB pre-optimization that are not included in -c, --uhf, -P and --input. For example: '--alpb ch2cl2 --gfn 1'
- crest_runsint, default=1
Specify as number of runs if multiple starting points from RDKit starting points is required.