filter
- add_stable_conformers(mols, cluster_mols, allenergy, sample)
Fill remaining slots with most stable conformers.
After clustering, add the lowest energy conformers that weren't selected as centroids to reach the target sample size.
- Args:
mols (list): All molecules sorted by energy cluster_mols (list): Currently selected cluster centroids allenergy (list): Energies of selected molecules sample (int): Target total number of conformers
- Returns:
tuple: (updated_cluster_mols, updated_allenergy)
- adjust_cluster_threshold(dists, num_mols, cluster_points)
Automatically adjust clustering threshold to reach target cluster count.
Uses Butina clustering algorithm and iteratively adjusts threshold until the number of clusters matches the target. Higher threshold = fewer clusters.
- Args:
dists (list): Pairwise RMS distances num_mols (int): Number of molecules to cluster cluster_points (int): Target number of clusters
- Returns:
tuple: (clusters, final_threshold)
- apply_energy_window_filter(sorted_all_cids, cenergy, energy_window)
Filter conformers by energy window from the lowest energy conformer.
Discards conformers with energy higher than energy_window relative to the lowest energy conformer.
- Args:
sorted_all_cids (list): Conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID energy_window (float): Maximum energy difference from minimum (kcal/mol)
- Returns:
list: Conformer IDs within the energy window
- apply_pre_energy_filter(sortedcids, cenergy, threshold)
Pre-filter conformers based solely on energy differences.
Selects conformers with energy differences greater than or equal to the threshold compared to previously selected conformers. This reduces the number of conformers before the more expensive RMSD filtering.
- Args:
sortedcids (list): Conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID threshold (float): Minimum energy difference to consider unique (kcal/mol)
- Returns:
list: Selected conformer IDs passing energy pre-filter
- apply_rmsd_and_energy_filter(outmols, selectedcids_initial, cenergy, args)
Filter conformers based on combined energy and RMSD criteria.
For each conformer, compares with already selected conformers. If energy difference is below threshold AND RMSD is below threshold with any selected conformer, the new conformer is rejected as a duplicate.
- Args:
outmols (dict/list): Conformer molecule objects indexed by ID selectedcids_initial (list): Pre-filtered conformer IDs cenergy (dict/list): Conformer energies indexed by ID args: Arguments object with thresholds:
energy_threshold: Energy similarity threshold (kcal/mol)
rms_threshold: RMSD similarity threshold (Angstroms)
heavyonly: Use only heavy atoms for RMSD
max_matches_rmsd: Maximum atom matches for RMSD calculation
- Returns:
list: Final selected conformer IDs
- check_geometric_match(self, mol_ensemb, mol_geom, match_type, geom)
Check if molecule matches geometric constraints.
Validates geometric parameters (bonds, angles, dihedrals) against specified thresholds using SMARTS pattern matching.
- Args:
self: AQME instance with threshold arguments mol_ensemb (rdkit.Chem.Mol): Molecule ensemble for SMARTS matching mol_geom (rdkit.Chem.Mol): Specific conformer for geometry calculation match_type (str): Type of matching ('regular_rule' or 'Ir_squareplanar') geom (list): Geometry specification
For regular_rule: [SMARTS, threshold_value] For Ir_squareplanar: [atom1, atom2, atom3, angle_value]
- Returns:
bool: True if geometry passes the constraint, False otherwise
- cluster_conformers(self, mols, program, csearch_file, name, sample)
Performs Butina clustering based on RMS differences of conformers.
Uses a two-step approach for RDKit: 1. Cluster conformers using RMS-based Butina algorithm (80% of sample) 2. Fill remaining slots with lowest energy conformers (20% of sample)
For CREST, selects conformers as starting points for CREST searches.
- Args:
mols (list): List of RDKit molecule objects with conformers program (str): Program name ('rdkit' or 'crest') csearch_file (str): Path to SDF file name (str): Molecule name for logging sample (int): Total number of conformers to select
- Returns:
list: Selected and sorted molecule objects
- compute_pairwise_rms_distances(self, mols)
Compute pairwise RMS distances for all conformers.
Creates a distance matrix of RMS values between all pairs of conformers. Uses only 100 atom matches since molecules are aligned with same numbering.
- Args:
mols (list): List of RDKit molecule objects with conformers
- Returns:
list: Flattened upper triangular distance matrix
- conformer_filters(self, sorted_all_cids, cenergy, outmols)
Apply sequential energy and RMSD-based conformer filters.
Three-stage filtering process: 1. Energy window filter (ewin_cmin) 2. Pre-filter based on energy differences 3. Combined energy and RMSD filter
- Args:
self: AQME instance with filter arguments sorted_all_cids (list): List of conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID outmols (dict/list): Conformer molecule objects indexed by ID
- Returns:
list: Selected conformer IDs that pass all filters
- determine_cluster_points(self, program, sample, name)
Determine target number of cluster points based on program and settings.
For RDKit: Uses 80% of sample size for clustering, reserves 20% for stable conformers. For CREST: Uses number of CREST runs as cluster points.
- Args:
program (str): Program name ('rdkit' or 'crest') sample (int): Total number of conformers to select name (str): Molecule name for logging
- Returns:
int: Target number of cluster points
- extract_cluster_centroids(mols, clusts, cluster_points)
Extract centroid conformers from each cluster.
Gets the first element (centroid) from each cluster and retrieves the corresponding molecule objects.
- Args:
mols (list): List of all molecule objects clusts (list): List of clusters from Butina clustering cluster_points (int): Maximum number of centroids to extract
- Returns:
tuple: (cluster_mols, allenergy) - selected molecules and their energies
- filters(mol, log, molwt_cutoff)
Apply basic molecular filters based on molecular weight and atom types.
Filters molecules that exceed weight cutoff or contain unsupported atoms.
- Args:
mol (rdkit.Chem.Mol): Molecule to filter log: Logger object for writing messages molwt_cutoff (float): Maximum allowed molecular weight (0 = no limit)
- Returns:
bool: True if molecule passes all filters, False otherwise
- geom_filter(self, mol_ensemb, mol_geom, geom)
Check if a molecule passes geometric filtering rules.
Applies geometry-based filters to conformers, including specialized rules for specific metal complexes (e.g., Ir squareplanar).
- Args:
self: AQME instance with arguments mol_ensemb (rdkit.Chem.Mol): Molecule ensemble with all conformers mol_geom (rdkit.Chem.Mol): Specific conformer to test geom (list): Geometry rule specification
Empty list [] means no filtering
['Ir_squareplanar'] for Ir complex special rule
[SMARTS, THRESHOLD] for custom geometric constraints
- Returns:
bool: True if molecule passes all geometric rules, False otherwise
- get_ir_squareplanar_geometry(mol)
Extract geometry parameters for Ir squareplanar complexes.
Identifies the two ligands of type A in trans configuration for Ir squareplanar complexes based on the ligands from DOI: https://doi.org/10.1039/D0SC00445F
- Args:
mol (rdkit.Chem.Mol): Molecule to analyze
- Returns:
- list: [L_atom_1, Ir_idx, L_atom_2, 180] if valid geometry found,
empty list otherwise
- write_clustered_sdf(self, cluster_mols_sorted, csearch_file)
Write clustered conformers to SDF file.
Replaces the original SDF file with the filtered conformers. In pytest mode, moves original file instead of removing it.
- Args:
cluster_mols_sorted (list): Molecules sorted by energy csearch_file (str): Path to SDF file to write