QDESCP

class qdescp(**kwargs)

Quantum Mechanical Descriptor Calculation and Processing.

This class handles the generation and processing of quantum mechanical descriptors using xTB, as well as analyzes NMR data. It provides functionality for:

  • Running xTB calculations for descriptor generation

  • Processing NMR data from Gaussian calculations

  • Managing Boltzmann averaging of properties

  • Creating ROBERT-compatible descriptor databases

The class supports both single molecule calculations and batch processing of multiple structures from various input formats including SDF, PDB, XYZ and CSV files.

Attributes:

args: Configuration object holding all calculation parameters start_time: Timestamp of calculation start

assign_atomic_properties(json_data, name_initial, atom_props, smarts_targets)

Assign atomic properties based on SMARTS pattern matches.

This method: 1. Matches SMARTS patterns to atoms 2. Assigns property prefixes to matched atoms 3. Updates JSON data with atomic properties

Args:

json_data (dict): Dictionary of molecular data name_initial (str): Base name for molecule atom_props (list): Atomic properties to assign smarts_targets (list): SMARTS patterns to match

Returns:

dict: Updated JSON data with atomic properties

Note:
  • Property prefixes are tracked to avoid duplicates

  • Properties are assigned only to matched atoms

  • Pattern matches use RDKit SMARTS matcher

cleanup(name, destination, xtb_passing, xtb_files_props)

Clean up calculation files and organize results.

This method: 1. Moves successful calculation results to destination 2. Moves failed calculations to failed/ subdirectory 3. Cleans up temporary files and directories

Args:

name (str): Base name for files destination (Path): Output directory path xtb_passing (bool): Whether xTB calculation succeeded xtb_files_props (dict): Dictionary of file paths

Note:
  • Successful calculations: JSON and XYZ files preserved

  • Failed calculations: All files moved to failed/ directory

  • Temporary directories are removed

gather_files_and_run(destination, file, atom_props, smarts_targets, bar)

Process input file(s) through xTB calculation and property collection.

This method handles the complete process for a single input file: 1. Converts input files to XYZ format if needed 2. Extracts charge and multiplicity information 3. Runs xTB calculations for each conformer 4. Collects and processes properties 5. Generates JSON output files

Args:

destination (Path): Output directory path file (str): Input file path (XYZ/PDB/SDF format) atom_props (list): Atomic properties to collect smarts_targets (list): SMARTS patterns to match bar (IncrementalBar): Progress bar instance

Note:
  • For XYZ files, conformers are processed directly

  • Other formats are converted to XYZ using OpenBabel

  • Charge/multiplicity are read from files or use defaults

  • Progress is tracked via the progress bar

get_boltz_n_save_csv(destination, qdescp_files, descp_dict, boltz_dir, smarts_targets)

Process molecular properties and generate Boltzmann-averaged data files.

This method: 1. Processes each valid input file to extract molecular properties 2. Generates Boltzmann-weighted JSON files for each molecule 3. Creates final CSV files with combined data

Args:

destination (Path): Output directory path qdescp_files (list): List of input structure files descp_dict (dict): Dictionary of descriptors to calculate boltz_dir (Path): Directory for Boltzmann-weighted results smarts_targets (list): SMARTS patterns for atom matching

get_boltz_props(json_files, name, boltz_dir, calc_type, descp_dict_indiv, smarts_targets, mol, all_prefixes_atoms)

Calculate Boltzmann-weighted properties from conformer results.

This method: 1. Reads properties from JSON files 2. Calculates Boltzmann weights based on energies 3. Averages properties across conformers 4. Adds RDKit molecular descriptors 5. Saves results to new JSON file

Args:

json_files (list): List of JSON files with conformer data name (str): Base name for output files boltz_dir (Path): Output directory for Boltzmann results calc_type (str): Calculation type ('xtb' or 'nmr') descp_dict_indiv (dict): Property dictionary for this molecule smarts_targets (list): SMARTS patterns to match mol: RDKit molecule object for descriptor calculation all_prefixes_atoms (list): List of atomic property prefixes

Returns:

None: Results are saved to JSON file

get_unique_files()

Filter input files to remove duplicates based on SMILES.

This method: 1. Reads SMILES strings from SDF files 2. Identifies duplicate structures 3. Keeps only unique structures 4. Warns about duplicates

Returns:

list: Paths to unique input files

Note:
  • Duplicates are identified by exact SMILES match

  • Files without SMILES are kept

  • Warning is logged for duplicate structures

initial_csearch_run(destination, qdescp_files)

Generate conformers from SMILES in CSV input.

This method: 1. Validates CSV input file existence 2. Sets up conformer search parameters 3. Runs RDKit conformer generation 4. Processes and validates generated conformers

Args:

destination (Path): Output directory path qdescp_files (list): List of input files (expecting single CSV)

Returns:

list: Paths to generated conformer files

Raises:

SystemExit: If CSV file not found or conformer generation fails

initial_xtb_check()

Validate and process input file selection.

This method: 1. Checks if input is provided via --input or --files 2. Validates file formats and paths 3. Converts input to standardized file list

Returns:

list: List of validated input file paths

Raises:

SystemExit: If no valid files are found or formats are invalid

morfeus_properties(name_initial, atom_props, smarts_targets, xtb_files_props, charge, mult, file)

Calculate and collect MORFEUS molecular descriptors.

This method: 1. Reads molecular geometry from XYZ file 2. Calculates MORFEUS descriptors 3. Assigns atomic properties based on SMARTS patterns 4. Saves results to JSON file

Args:

name_initial (str): Base name for input/output files atom_props (list): Atomic properties to calculate smarts_targets (list): SMARTS patterns for atom matching xtb_files_props (dict): Dictionary of file paths charge (int): Molecular charge mult (int): Molecular multiplicity file (str): Original input file path

Returns:

None: Results are saved to JSON file

Raises:

FileNotFoundError: If XYZ file is missing Exception: If MORFEUS descriptor calculation fails

process_aqme_csv(name_db)

Process and update AQME CSV result files.

This method processes three types of CSV files: - full: All descriptors - denovo: De novo descriptors only - interpret: Interpretable descriptors only

For each file: 1. Reads original and generated CSVs 2. Adds missing entries 3. Sorts according to input order 4. Fills missing values

Args:

name_db (str): Database name prefix for output files

Note:
  • Missing entries are filled with group-wise first values

  • Original file order is preserved

  • Warns if files are not found

qdescp_nmr_workflow(boltz_dir)

Run NMR workflow for chemical shift prediction.

This method: 1. Validates input files (must be JSON format) 2. Processes conformer JSON files 3. Calculates Boltzmann-weighted NMR properties 4. Applies empirical corrections if provided

Args:

boltz_dir (Path): Directory for Boltzmann-averaged results

Raises:

SystemExit: If input files are not in JSON format

qdescp_set_up()

Initialize and validate QDESCP run settings.

This method performs initial setup and validation: 1. Sets default program to xTB if not specified 2. Validates program selection (xTB or NMR) 3. Configures parallel processing settings 4. Sets sampling parameters 5. Validates input files and directories 6. Creates required output directories

Returns:
tuple: Contains:
  • self: Updated QDESCP instance

  • destination (Path): Configured output directory

  • smarts_targets (list): SMARTS patterns to match

  • boltz_dir (Path): Directory for Boltzmann calculations

Raises:

SystemExit: If program selection or input files are invalid

qdescp_xtb_workflow(boltz_dir, destination, smarts_targets)

Run the complete xTB workflow for descriptor generation and collection.

This method handles the complete process of generating and collecting quantum mechanical descriptors using xTB, including:

  1. Initial input validation and setup

  2. Optional conformer generation for CSV inputs

  3. Automatic SMARTS pattern detection

  4. Parallel descriptor calculation

  5. Results collection and processing

Args:

boltz_dir (Path): Directory for Boltzmann-averaged results destination (Path): Main output directory smarts_targets (list): List of SMARTS patterns to match

Note:
  • For CSV inputs, conformers are generated before descriptor calculation

  • When no SMARTS patterns are provided, they are auto-detected

  • Invalid SMARTS patterns (< 75% compatibility) are removed

  • Uses parallel processing for efficiency with reproducibility

run_opt_xtb(file, xyz_file, charge, mult, name, destination)

Run xTB property calculations for a molecule.

Args:

file (str): Original input file path xyz_file (str): XYZ format geometry file path charge (int): Molecular charge mult (int): Molecular multiplicity name (str): Base name for output files destination (Path): Output directory path

Returns:

tuple: (success status, dict of file paths)

write_csv_boltz_data(destination)

Generate CSV files containing Boltzmann-averaged descriptor data.

This method combines descriptor data from JSON files with input CSV data and generates multiple output files: - Full descriptor database - De novo descriptor subset - Interpretable descriptor subset

The method handles both AQME-Descriptors and AQME-ROBERT workflows.

Args:

destination (Path): Directory containing Boltzmann JSON files

Parameters

General

w_dir_mainstr, default=os.getcwd()

Working directory

destinationstr, default=None,

Directory to create the JSON file(s)

programstr, default=xtb

Program required to create the new descriptors. Current options: 'xtb', 'nmr'

nprocsint, default=None

Number of xTB jobs run in parallel with 1 proc each (1 proc for reproducibility in the results). Also, nprocs used in CSEARCH

qdescp_atomslist of str, default=[]

Type of atom or group to calculate atomic properties. This option admits atoms (i.e., qdescp_atoms=['P']) and SMART patterns (i.e., qdescp_atoms=['C=O'])

robertbool, default=True

Creates a database ready to use in an AQME-ROBERT machine learning workflow, combining the input CSV with SMILES/code_name and the calculated xTB/DBSTEP descriptors

xTB and MORFEUS descriptors

files or input (both options are valid)list of str, default=''

Filenames of SDF/PDB/XYZ/CSV files to calculate xTB descriptors. If CSV is selected, a CSV with two columns is required (code_name and SMILES), since AQME will generate conformers from SMILES with CSEARCH before QDESCP generates descriptors.

chargeint, default=None

Charge of the calculations used in the following input files (charges from SDF files generated in CSEARCH are read automatically).

multint, default=None

Multiplicity of the calculations used in the following input files (multiplicities from SDF files generated in CSEARCH are read automatically).

gfn_versionint, default="2"

GFN version used in QDESCP to calculate descriptors.

qdescp_solventstr, default=None

Solvent used in the xTB property calculations (ALPB model)

qdescp_tempfloat, default=300

Temperature required for the xTB property calculations

qdescp_accfloat, default=0.2

Accuracy required for the xTB property calculations

qdescp_optstr, default='normal'

Convergence criteria required for the xTB property calculations

boltzbool, default=True

Calculation of Boltzmann averaged xTB properties and addition of RDKit molecular descriptors

xtb_optbool, default=True

Performs an initial xTB geometry optimization before calculating descriptors

vbur_radiusfloat, default=3.5

Adjusts the radius in the buried volume calculations of MORFEUS

NMR simulation

fileslist of str, default=''

Filenames of LOG files to retrieve NMR shifts from Gaussian calculations (*.log can be used to include all the log files in the working directory)

boltzbool, default=True

Calculation of Boltzmann averaged NMR shifts

nmr_atomslist of str, default=[6, 1]

List containing the atom types (as atomic numbers) to consider. For example, if the user wants to retrieve NMR shifts from C and H atoms nmr_atoms=[6, 1]

nmr_slopelist of float, default=[-1.0537, -1.0784]

List containing the slope to apply for the raw NMR shifts calculated with Gaussian. A slope needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_slope=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_interceptlist of float, default=[181.7815, 31.8723]

List containing the intercept to apply for the raw NMR shifts calculated with Gaussian. An intercept needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_intercept=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_experimstr, default=None

Filename of a CSV containing the experimental NMR shifts. Two columnds are needed: A) 'atom_idx' should contain the indexes of the atoms to study as seen in GaussView or other molecular visualizers (i.e., the first atom of the coordinates has index 1); B) 'experimental_ppm' should contain the experimental NMR shifts in ppm observed for the atoms.