filter

add_stable_conformers(mols, cluster_mols, allenergy, sample)

Fill remaining slots with most stable conformers.

After clustering, add the lowest energy conformers that weren't selected as centroids to reach the target sample size.

Args:: mols (list): All molecules sorted by energy cluster_mols (list): Currently selected cluster centroids allenergy (list): Energies of selected molecules sample (int): Target total number of conformers
Returns:: tuple: (updated_cluster_mols, updated_allenergy)

adjust_cluster_threshold(dists, num_mols, cluster_points)

Automatically adjust clustering threshold to reach target cluster count.

Uses Butina clustering algorithm and iteratively adjusts threshold until the number of clusters matches the target. Higher threshold = fewer clusters.

Args:: dists (list): Pairwise RMS distances num_mols (int): Number of molecules to cluster cluster_points (int): Target number of clusters
Returns:: tuple: (clusters, final_threshold)

apply_energy_window_filter(sorted_all_cids, cenergy, energy_window)

Filter conformers by energy window from the lowest energy conformer.

Discards conformers with energy higher than energy_window relative to the lowest energy conformer.

Args:: sorted_all_cids (list): Conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID energy_window (float): Maximum energy difference from minimum (kcal/mol)
Returns:: list: Conformer IDs within the energy window

apply_pre_energy_filter(sortedcids, cenergy, threshold)

Pre-filter conformers based solely on energy differences.

Selects conformers with energy differences greater than or equal to the threshold compared to previously selected conformers. This reduces the number of conformers before the more expensive RMSD filtering.

Args:: sortedcids (list): Conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID threshold (float): Minimum energy difference to consider unique (kcal/mol)
Returns:: list: Selected conformer IDs passing energy pre-filter

apply_rmsd_and_energy_filter(outmols, selectedcids_initial, cenergy, args)

Filter conformers based on combined energy and RMSD criteria.

For each conformer, compares with already selected conformers. If energy difference is below threshold AND RMSD is below threshold with any selected conformer, the new conformer is rejected as a duplicate.

Args:

outmols (dict/list): Conformer molecule objects indexed by ID selectedcids_initial (list): Pre-filtered conformer IDs cenergy (dict/list): Conformer energies indexed by ID args: Arguments object with thresholds:

energy_threshold: Energy similarity threshold (kcal/mol)

rms_threshold: RMSD similarity threshold (Angstroms)

heavyonly: Use only heavy atoms for RMSD

max_matches_rmsd: Maximum atom matches for RMSD calculation

Returns:

list: Final selected conformer IDs

check_geometric_match(self, mol_ensemb, mol_geom, match_type, geom)

Check if molecule matches geometric constraints.

Validates geometric parameters (bonds, angles, dihedrals) against specified thresholds using SMARTS pattern matching.

Args:: self: AQME instance with threshold arguments mol_ensemb (rdkit.Chem.Mol): Molecule ensemble for SMARTS matching mol_geom (rdkit.Chem.Mol): Specific conformer for geometry calculation match_type (str): Type of matching ('regular_rule' or 'Ir_squareplanar') geom (list): Geometry specification

For regular_rule: [SMARTS, threshold_value] For Ir_squareplanar: [atom1, atom2, atom3, angle_value]
Returns:: bool: True if geometry passes the constraint, False otherwise

cluster_conformers(self, mols, program, csearch_file, name, sample)

Performs Butina clustering based on RMS differences of conformers.

Uses a two-step approach for RDKit: 1. Cluster conformers using RMS-based Butina algorithm (80% of sample) 2. Fill remaining slots with lowest energy conformers (20% of sample)

For CREST, selects conformers as starting points for CREST searches.

Args:: mols (list): List of RDKit molecule objects with conformers program (str): Program name ('rdkit' or 'crest') csearch_file (str): Path to SDF file name (str): Molecule name for logging sample (int): Total number of conformers to select
Returns:: list: Selected and sorted molecule objects

compute_pairwise_rms_distances(self, mols)

Compute pairwise RMS distances for all conformers.

Creates a distance matrix of RMS values between all pairs of conformers. Uses only 100 atom matches since molecules are aligned with same numbering.

Args:: mols (list): List of RDKit molecule objects with conformers
Returns:: list: Flattened upper triangular distance matrix

conformer_filters(self, sorted_all_cids, cenergy, outmols)

Apply sequential energy and RMSD-based conformer filters.

Three-stage filtering process: 1. Energy window filter (ewin_cmin) 2. Pre-filter based on energy differences 3. Combined energy and RMSD filter

Args:: self: AQME instance with filter arguments sorted_all_cids (list): List of conformer IDs sorted by energy cenergy (dict/list): Conformer energies indexed by ID outmols (dict/list): Conformer molecule objects indexed by ID
Returns:: list: Selected conformer IDs that pass all filters

determine_cluster_points(self, program, sample, name)

Determine target number of cluster points based on program and settings.

For RDKit: Uses 80% of sample size for clustering, reserves 20% for stable conformers. For CREST: Uses number of CREST runs as cluster points.

Args:: program (str): Program name ('rdkit' or 'crest') sample (int): Total number of conformers to select name (str): Molecule name for logging
Returns:: int: Target number of cluster points

extract_cluster_centroids(mols, clusts, cluster_points)

Extract centroid conformers from each cluster.

Gets the first element (centroid) from each cluster and retrieves the corresponding molecule objects.

Args:: mols (list): List of all molecule objects clusts (list): List of clusters from Butina clustering cluster_points (int): Maximum number of centroids to extract
Returns:: tuple: (cluster_mols, allenergy) - selected molecules and their energies

filters(mol, log, molwt_cutoff)

Apply basic molecular filters based on molecular weight and atom types.

Filters molecules that exceed weight cutoff or contain unsupported atoms.

Args:: mol (rdkit.Chem.Mol): Molecule to filter log: Logger object for writing messages molwt_cutoff (float): Maximum allowed molecular weight (0 = no limit)
Returns:: bool: True if molecule passes all filters, False otherwise

geom_filter(self, mol_ensemb, mol_geom, geom)

Check if a molecule passes geometric filtering rules.

Applies geometry-based filters to conformers, including specialized rules for specific metal complexes (e.g., Ir squareplanar).

Args:

self: AQME instance with arguments mol_ensemb (rdkit.Chem.Mol): Molecule ensemble with all conformers mol_geom (rdkit.Chem.Mol): Specific conformer to test geom (list): Geometry rule specification

Empty list [] means no filtering

['Ir_squareplanar'] for Ir complex special rule

[SMARTS, THRESHOLD] for custom geometric constraints

Returns:

bool: True if molecule passes all geometric rules, False otherwise

get_ir_squareplanar_geometry(mol)

Extract geometry parameters for Ir squareplanar complexes.

Identifies the two ligands of type A in trans configuration for Ir squareplanar complexes based on the ligands from DOI: https://doi.org/10.1039/D0SC00445F

Args:

mol (rdkit.Chem.Mol): Molecule to analyze

Returns:

list: [L_atom_1, Ir_idx, L_atom_2, 180] if valid geometry found,: empty list otherwise

write_clustered_sdf(self, cluster_mols_sorted, csearch_file)

Write clustered conformers to SDF file.

Replaces the original SDF file with the filtered conformers. In pytest mode, moves original file instead of removing it.

Args:: cluster_mols_sorted (list): Molecules sorted by energy csearch_file (str): Path to SDF file to write