QDESCP
- class qdescp(**kwargs)
Quantum Mechanical Descriptor Calculation and Processing.
This class handles the generation and processing of quantum mechanical descriptors using xTB, as well as analyzes NMR data. It provides functionality for:
Running xTB calculations for descriptor generation
Processing NMR data from Gaussian calculations
Managing Boltzmann averaging of properties
Creating ROBERT-compatible descriptor databases
The class supports both single molecule calculations and batch processing of multiple structures from various input formats including SDF, PDB, XYZ and CSV files.
- Attributes:
args: Configuration object holding all calculation parameters start_time: Timestamp of calculation start
- assign_atomic_properties(json_data, name_initial, atom_props, smarts_targets)
Assign atomic properties based on SMARTS pattern matches.
This method: 1. Matches SMARTS patterns to atoms 2. Assigns property prefixes to matched atoms 3. Updates JSON data with atomic properties
- Args:
json_data (dict): Dictionary of molecular data name_initial (str): Base name for molecule atom_props (list): Atomic properties to assign smarts_targets (list): SMARTS patterns to match
- Returns:
dict: Updated JSON data with atomic properties
- Note:
Property prefixes are tracked to avoid duplicates
Properties are assigned only to matched atoms
Pattern matches use RDKit SMARTS matcher
- cleanup(name, destination, xtb_passing, xtb_files_props)
Clean up calculation files and organize results.
This method: 1. Moves successful calculation results to destination 2. Moves failed calculations to failed/ subdirectory 3. Cleans up temporary files and directories
- Args:
name (str): Base name for files destination (Path): Output directory path xtb_passing (bool): Whether xTB calculation succeeded xtb_files_props (dict): Dictionary of file paths
- Note:
Successful calculations: JSON and XYZ files preserved
Failed calculations: All files moved to failed/ directory
Temporary directories are removed
- gather_files_and_run(destination, file, atom_props, smarts_targets, bar)
Process input file(s) through xTB calculation and property collection.
This method handles the complete process for a single input file: 1. Converts input files to XYZ format if needed 2. Extracts charge and multiplicity information 3. Runs xTB calculations for each conformer 4. Collects and processes properties 5. Generates JSON output files
- Args:
destination (Path): Output directory path file (str): Input file path (XYZ/PDB/SDF format) atom_props (list): Atomic properties to collect smarts_targets (list): SMARTS patterns to match bar (IncrementalBar): Progress bar instance
- Note:
For XYZ files, conformers are processed directly
Other formats are converted to XYZ using OpenBabel
Charge/multiplicity are read from files or use defaults
Progress is tracked via the progress bar
- get_boltz_n_save_csv(destination, qdescp_files, descp_dict, boltz_dir, smarts_targets)
Process molecular properties and generate Boltzmann-averaged data files.
This method: 1. Processes each valid input file to extract molecular properties 2. Generates Boltzmann-weighted JSON files for each molecule 3. Creates final CSV files with combined data
- Args:
destination (Path): Output directory path qdescp_files (list): List of input structure files descp_dict (dict): Dictionary of descriptors to calculate boltz_dir (Path): Directory for Boltzmann-weighted results smarts_targets (list): SMARTS patterns for atom matching
- get_boltz_props(json_files, name, boltz_dir, calc_type, descp_dict_indiv, smarts_targets, mol, all_prefixes_atoms)
Calculate Boltzmann-weighted properties from conformer results.
This method: 1. Reads properties from JSON files 2. Calculates Boltzmann weights based on energies 3. Averages properties across conformers 4. Adds RDKit molecular descriptors 5. Saves results to new JSON file
- Args:
json_files (list): List of JSON files with conformer data name (str): Base name for output files boltz_dir (Path): Output directory for Boltzmann results calc_type (str): Calculation type ('xtb' or 'nmr') descp_dict_indiv (dict): Property dictionary for this molecule smarts_targets (list): SMARTS patterns to match mol: RDKit molecule object for descriptor calculation all_prefixes_atoms (list): List of atomic property prefixes
- Returns:
None: Results are saved to JSON file
- get_unique_files()
Filter input files to remove duplicates based on SMILES.
This method: 1. Reads SMILES strings from SDF files 2. Identifies duplicate structures 3. Keeps only unique structures 4. Warns about duplicates
- Returns:
list: Paths to unique input files
- Note:
Duplicates are identified by exact SMILES match
Files without SMILES are kept
Warning is logged for duplicate structures
- initial_csearch_run(destination, qdescp_files)
Generate conformers from SMILES in CSV input.
This method: 1. Validates CSV input file existence 2. Sets up conformer search parameters 3. Runs RDKit conformer generation 4. Processes and validates generated conformers
- Args:
destination (Path): Output directory path qdescp_files (list): List of input files (expecting single CSV)
- Returns:
list: Paths to generated conformer files
- Raises:
SystemExit: If CSV file not found or conformer generation fails
- initial_xtb_check()
Validate and process input file selection.
This method: 1. Checks if input is provided via --input or --files 2. Validates file formats and paths 3. Converts input to standardized file list
- Returns:
list: List of validated input file paths
- Raises:
SystemExit: If no valid files are found or formats are invalid
- morfeus_properties(name_initial, atom_props, smarts_targets, xtb_files_props, charge, mult, file)
Calculate and collect MORFEUS molecular descriptors.
This method: 1. Reads molecular geometry from XYZ file 2. Calculates MORFEUS descriptors 3. Assigns atomic properties based on SMARTS patterns 4. Saves results to JSON file
- Args:
name_initial (str): Base name for input/output files atom_props (list): Atomic properties to calculate smarts_targets (list): SMARTS patterns for atom matching xtb_files_props (dict): Dictionary of file paths charge (int): Molecular charge mult (int): Molecular multiplicity file (str): Original input file path
- Returns:
None: Results are saved to JSON file
- Raises:
FileNotFoundError: If XYZ file is missing Exception: If MORFEUS descriptor calculation fails
- process_aqme_csv(name_db)
Process and update AQME CSV result files.
This method processes three types of CSV files: - full: All descriptors - denovo: De novo descriptors only - interpret: Interpretable descriptors only
For each file: 1. Reads original and generated CSVs 2. Adds missing entries 3. Sorts according to input order 4. Fills missing values
- Args:
name_db (str): Database name prefix for output files
- Note:
Missing entries are filled with group-wise first values
Original file order is preserved
Warns if files are not found
- qdescp_nmr_workflow(boltz_dir)
Run NMR workflow for chemical shift prediction.
This method: 1. Validates input files (must be JSON format) 2. Processes conformer JSON files 3. Calculates Boltzmann-weighted NMR properties 4. Applies empirical corrections if provided
- Args:
boltz_dir (Path): Directory for Boltzmann-averaged results
- Raises:
SystemExit: If input files are not in JSON format
- qdescp_set_up()
Initialize and validate QDESCP run settings.
This method performs initial setup and validation: 1. Sets default program to xTB if not specified 2. Validates program selection (xTB or NMR) 3. Configures parallel processing settings 4. Sets sampling parameters 5. Validates input files and directories 6. Creates required output directories
- Returns:
- tuple: Contains:
self: Updated QDESCP instance
destination (Path): Configured output directory
smarts_targets (list): SMARTS patterns to match
boltz_dir (Path): Directory for Boltzmann calculations
- Raises:
SystemExit: If program selection or input files are invalid
- qdescp_xtb_workflow(boltz_dir, destination, smarts_targets)
Run the complete xTB workflow for descriptor generation and collection.
This method handles the complete process of generating and collecting quantum mechanical descriptors using xTB, including:
Initial input validation and setup
Optional conformer generation for CSV inputs
Automatic SMARTS pattern detection
Parallel descriptor calculation
Results collection and processing
- Args:
boltz_dir (Path): Directory for Boltzmann-averaged results destination (Path): Main output directory smarts_targets (list): List of SMARTS patterns to match
- Note:
For CSV inputs, conformers are generated before descriptor calculation
When no SMARTS patterns are provided, they are auto-detected
Invalid SMARTS patterns (< 75% compatibility) are removed
Uses parallel processing for efficiency with reproducibility
- run_opt_xtb(file, xyz_file, charge, mult, name, destination)
Run xTB property calculations for a molecule.
- Args:
file (str): Original input file path xyz_file (str): XYZ format geometry file path charge (int): Molecular charge mult (int): Molecular multiplicity name (str): Base name for output files destination (Path): Output directory path
- Returns:
tuple: (success status, dict of file paths)
- write_csv_boltz_data(destination)
Generate CSV files containing Boltzmann-averaged descriptor data.
This method combines descriptor data from JSON files with input CSV data and generates multiple output files: - Full descriptor database - De novo descriptor subset - Interpretable descriptor subset
The method handles both AQME-Descriptors and AQME-ROBERT workflows.
- Args:
destination (Path): Directory containing Boltzmann JSON files
Parameters
General
- w_dir_mainstr, default=os.getcwd()
Working directory
- destinationstr, default=None,
Directory to create the JSON file(s)
- programstr, default=xtb
Program required to create the new descriptors. Current options: 'xtb', 'nmr'
- nprocsint, default=None
Number of xTB jobs run in parallel with 1 proc each (1 proc for reproducibility in the results). Also, nprocs used in CSEARCH
- qdescp_atomslist of str, default=[]
Type of atom or group to calculate atomic properties. This option admits atoms (i.e., qdescp_atoms=['P']) and SMART patterns (i.e., qdescp_atoms=['C=O'])
- robertbool, default=True
Creates a database ready to use in an AQME-ROBERT machine learning workflow, combining the input CSV with SMILES/code_name and the calculated xTB/DBSTEP descriptors
xTB and MORFEUS descriptors
- files or input (both options are valid)list of str, default=''
Filenames of SDF/PDB/XYZ/CSV files to calculate xTB descriptors. If CSV is selected, a CSV with two columns is required (code_name and SMILES), since AQME will generate conformers from SMILES with CSEARCH before QDESCP generates descriptors.
- chargeint, default=None
Charge of the calculations used in the following input files (charges from SDF files generated in CSEARCH are read automatically).
- multint, default=None
Multiplicity of the calculations used in the following input files (multiplicities from SDF files generated in CSEARCH are read automatically).
- gfn_versionint, default="2"
GFN version used in QDESCP to calculate descriptors.
- qdescp_solventstr, default=None
Solvent used in the xTB property calculations (ALPB model)
- qdescp_tempfloat, default=300
Temperature required for the xTB property calculations
- qdescp_accfloat, default=0.2
Accuracy required for the xTB property calculations
- qdescp_optstr, default='normal'
Convergence criteria required for the xTB property calculations
- boltzbool, default=True
Calculation of Boltzmann averaged xTB properties and addition of RDKit molecular descriptors
- xtb_optbool, default=True
Performs an initial xTB geometry optimization before calculating descriptors
- vbur_radiusfloat, default=3.5
Adjusts the radius in the buried volume calculations of MORFEUS
NMR simulation
- fileslist of str, default=''
Filenames of LOG files to retrieve NMR shifts from Gaussian calculations (*.log can be used to include all the log files in the working directory)
- boltzbool, default=True
Calculation of Boltzmann averaged NMR shifts
- nmr_atomslist of str, default=[6, 1]
List containing the atom types (as atomic numbers) to consider. For example, if the user wants to retrieve NMR shifts from C and H atoms nmr_atoms=[6, 1]
- nmr_slopelist of float, default=[-1.0537, -1.0784]
List containing the slope to apply for the raw NMR shifts calculated with Gaussian. A slope needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_slope=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.
- nmr_interceptlist of float, default=[181.7815, 31.8723]
List containing the intercept to apply for the raw NMR shifts calculated with Gaussian. An intercept needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_intercept=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.
- nmr_experimstr, default=None
Filename of a CSV containing the experimental NMR shifts. Two columnds are needed: A) 'atom_idx' should contain the indexes of the atoms to study as seen in GaussView or other molecular visualizers (i.e., the first atom of the coordinates has index 1); B) 'experimental_ppm' should contain the experimental NMR shifts in ppm observed for the atoms.