QDESCP

class qdescp(**kwargs)

Quantum Mechanical Descriptor Calculation and Processing.

This class handles the generation and processing of quantum mechanical descriptors using xTB, as well as analyzes NMR data. It provides functionality for:

Running xTB calculations for descriptor generation
Processing NMR data from Gaussian calculations
Managing Boltzmann averaging of properties
Creating ROBERT-compatible descriptor databases

The class supports both single molecule calculations and batch processing of multiple structures from various input formats including SDF, PDB, XYZ and CSV files.

Attributes:: args: Configuration object holding all calculation parameters start_time: Timestamp of calculation start

assign_atomic_properties(json_data, name_initial, atom_props, smarts_targets)

Assign atomic properties based on SMARTS pattern matches.

This method: 1. Matches SMARTS patterns to atoms 2. Assigns property prefixes to matched atoms 3. Updates JSON data with atomic properties

Args:

json_data (dict): Dictionary of molecular data name_initial (str): Base name for molecule atom_props (list): Atomic properties to assign smarts_targets (list): SMARTS patterns to match

Returns:

dict: Updated JSON data with atomic properties

Note:

Property prefixes are tracked to avoid duplicates
Properties are assigned only to matched atoms
Pattern matches use RDKit SMARTS matcher

cleanup(name, destination, xtb_passing, xtb_files_props)

Clean up calculation files and organize results.

This method: 1. Moves successful calculation results to destination 2. Moves failed calculations to failed/ subdirectory 3. Cleans up temporary files and directories

Args:

name (str): Base name for files destination (Path): Output directory path xtb_passing (bool): Whether xTB calculation succeeded xtb_files_props (dict): Dictionary of file paths

Note:

Successful calculations: JSON and XYZ files preserved
Failed calculations: All files moved to failed/ directory
Temporary directories are removed

gather_files_and_run(destination, file, atom_props, smarts_targets, bar)

Process input file(s) through xTB calculation and property collection.

This method handles the complete process for a single input file: 1. Converts input files to XYZ format if needed 2. Extracts charge and multiplicity information 3. Runs xTB calculations for each conformer 4. Collects and processes properties 5. Generates JSON output files

Args:

destination (Path): Output directory path file (str): Input file path (XYZ/PDB/SDF format) atom_props (list): Atomic properties to collect smarts_targets (list): SMARTS patterns to match bar (IncrementalBar): Progress bar instance

Note:

For XYZ files, conformers are processed directly
Other formats are converted to XYZ using OpenBabel
Charge/multiplicity are read from files or use defaults
Progress is tracked via the progress bar

get_boltz_n_save_csv(destination, qdescp_files, descp_dict, boltz_dir, smarts_targets)

Process molecular properties and generate Boltzmann-averaged data files.

This method: 1. Processes each valid input file to extract molecular properties 2. Generates Boltzmann-weighted JSON files for each molecule 3. Creates final CSV files with combined data

Args:: destination (Path): Output directory path qdescp_files (list): List of input structure files descp_dict (dict): Dictionary of descriptors to calculate boltz_dir (Path): Directory for Boltzmann-weighted results smarts_targets (list): SMARTS patterns for atom matching

get_boltz_props(json_files, name, boltz_dir, calc_type, descp_dict_indiv, smarts_targets, mol, all_prefixes_atoms)

Calculate Boltzmann-weighted properties from conformer results.

This method: 1. Reads properties from JSON files 2. Calculates Boltzmann weights based on energies 3. Averages properties across conformers 4. Adds RDKit molecular descriptors 5. Saves results to new JSON file

Args:: json_files (list): List of JSON files with conformer data name (str): Base name for output files boltz_dir (Path): Output directory for Boltzmann results calc_type (str): Calculation type ('xtb' or 'nmr') descp_dict_indiv (dict): Property dictionary for this molecule smarts_targets (list): SMARTS patterns to match mol: RDKit molecule object for descriptor calculation all_prefixes_atoms (list): List of atomic property prefixes
Returns:: None: Results are saved to JSON file

get_unique_files()

Filter input files to remove duplicates based on SMILES.

This method: 1. Reads SMILES strings from SDF files 2. Identifies duplicate structures 3. Keeps only unique structures 4. Warns about duplicates

Returns:

list: Paths to unique input files

Note:

Duplicates are identified by exact SMILES match
Files without SMILES are kept
Warning is logged for duplicate structures

initial_csearch_run(destination, qdescp_files)

Generate conformers from SMILES in CSV input.

This method: 1. Validates CSV input file existence 2. Sets up conformer search parameters 3. Runs RDKit conformer generation 4. Processes and validates generated conformers

Args:: destination (Path): Output directory path qdescp_files (list): List of input files (expecting single CSV)
Returns:: list: Paths to generated conformer files
Raises:: SystemExit: If CSV file not found or conformer generation fails

initial_xtb_check()

Validate and process input file selection.

This method: 1. Checks if input is provided via --input or --files 2. Validates file formats and paths 3. Converts input to standardized file list

Returns:: list: List of validated input file paths
Raises:: SystemExit: If no valid files are found or formats are invalid

morfeus_properties(name_initial, atom_props, smarts_targets, xtb_files_props, charge, mult, file)

Calculate and collect MORFEUS molecular descriptors.

This method: 1. Reads molecular geometry from XYZ file 2. Calculates MORFEUS descriptors 3. Assigns atomic properties based on SMARTS patterns 4. Saves results to JSON file

Args:: name_initial (str): Base name for input/output files atom_props (list): Atomic properties to calculate smarts_targets (list): SMARTS patterns for atom matching xtb_files_props (dict): Dictionary of file paths charge (int): Molecular charge mult (int): Molecular multiplicity file (str): Original input file path
Returns:: None: Results are saved to JSON file
Raises:: FileNotFoundError: If XYZ file is missing Exception: If MORFEUS descriptor calculation fails

process_aqme_csv(name_db)

Process and update AQME CSV result files.

This method processes three types of CSV files: - full: All descriptors - denovo: De novo descriptors only - interpret: Interpretable descriptors only

For each file: 1. Reads original and generated CSVs 2. Adds missing entries 3. Sorts according to input order 4. Fills missing values

Args:

name_db (str): Database name prefix for output files

Note:

Missing entries are filled with group-wise first values
Original file order is preserved
Warns if files are not found

qdescp_nmr_workflow(boltz_dir)

Run NMR workflow for chemical shift prediction.

This method: 1. Validates input files (must be JSON format) 2. Processes conformer JSON files 3. Calculates Boltzmann-weighted NMR properties 4. Applies empirical corrections if provided

Args:: boltz_dir (Path): Directory for Boltzmann-averaged results
Raises:: SystemExit: If input files are not in JSON format

qdescp_set_up()

Initialize and validate QDESCP run settings.

This method performs initial setup and validation: 1. Sets default program to xTB if not specified 2. Validates program selection (xTB or NMR) 3. Configures parallel processing settings 4. Sets sampling parameters 5. Validates input files and directories 6. Creates required output directories

Returns:

tuple: Contains:

self: Updated QDESCP instance
destination (Path): Configured output directory
smarts_targets (list): SMARTS patterns to match
boltz_dir (Path): Directory for Boltzmann calculations

Raises:

SystemExit: If program selection or input files are invalid

qdescp_xtb_workflow(boltz_dir, destination, smarts_targets)

Run the complete xTB workflow for descriptor generation and collection.

This method handles the complete process of generating and collecting quantum mechanical descriptors using xTB, including:

Initial input validation and setup
Optional conformer generation for CSV inputs
Automatic SMARTS pattern detection
Parallel descriptor calculation
Results collection and processing

Args:

boltz_dir (Path): Directory for Boltzmann-averaged results destination (Path): Main output directory smarts_targets (list): List of SMARTS patterns to match

Note:

For CSV inputs, conformers are generated before descriptor calculation
When no SMARTS patterns are provided, they are auto-detected
Invalid SMARTS patterns (< 75% compatibility) are removed
Uses parallel processing for efficiency with reproducibility

run_opt_xtb(file, xyz_file, charge, mult, name, destination)

Run xTB property calculations for a molecule.

Args:: file (str): Original input file path xyz_file (str): XYZ format geometry file path charge (int): Molecular charge mult (int): Molecular multiplicity name (str): Base name for output files destination (Path): Output directory path
Returns:: tuple: (success status, dict of file paths)

write_csv_boltz_data(destination)

Generate CSV files containing Boltzmann-averaged descriptor data.

This method combines descriptor data from JSON files with input CSV data and generates multiple output files: - Full descriptor database - De novo descriptor subset - Interpretable descriptor subset

The method handles both AQME-Descriptors and AQME-ROBERT workflows.

Args:: destination (Path): Directory containing Boltzmann JSON files

Parameters

General

w_dir_mainstr, default=os.getcwd()
Working directory

destinationstr, default=None,
Directory to create the JSON file(s)

programstr, default=xtb
Program required to create the new descriptors. Current options: 'xtb', 'nmr'

nprocsint, default=None
Number of xTB jobs run in parallel with 1 proc each (1 proc for reproducibility in the results). Also, nprocs used in CSEARCH

qdescp_atomslist of str, default=[]
Type of atom or group to calculate atomic properties. This option admits atoms (i.e., qdescp_atoms=['P']) and SMART patterns (i.e., qdescp_atoms=['C=O'])

robertbool, default=True
Creates a database ready to use in an AQME-ROBERT machine learning workflow, combining the input CSV with SMILES/code_name and the calculated xTB/DBSTEP descriptors

xTB and MORFEUS descriptors

files or input (both options are valid)list of str, default=''
Filenames of SDF/PDB/XYZ/CSV files to calculate xTB descriptors. If CSV is selected, a CSV with two columns is required (code_name and SMILES), since AQME will generate conformers from SMILES with CSEARCH before QDESCP generates descriptors.

chargeint, default=None
Charge of the calculations used in the following input files (charges from SDF files generated in CSEARCH are read automatically).

multint, default=None
Multiplicity of the calculations used in the following input files (multiplicities from SDF files generated in CSEARCH are read automatically).

gfn_versionint, default="2"
GFN version used in QDESCP to calculate descriptors.

qdescp_solventstr, default=None
Solvent used in the xTB property calculations (ALPB model)

qdescp_tempfloat, default=300
Temperature required for the xTB property calculations

qdescp_accfloat, default=0.2
Accuracy required for the xTB property calculations

qdescp_optstr, default='normal'
Convergence criteria required for the xTB property calculations

boltzbool, default=True
Calculation of Boltzmann averaged xTB properties and addition of RDKit molecular descriptors

xtb_optbool, default=True
Performs an initial xTB geometry optimization before calculating descriptors

vbur_radiusfloat, default=3.5
Adjusts the radius in the buried volume calculations of MORFEUS

NMR simulation

fileslist of str, default=''
Filenames of LOG files to retrieve NMR shifts from Gaussian calculations (*.log can be used to include all the log files in the working directory)

boltzbool, default=True
Calculation of Boltzmann averaged NMR shifts

nmr_atomslist of str, default=[6, 1]
List containing the atom types (as atomic numbers) to consider. For example, if the user wants to retrieve NMR shifts from C and H atoms nmr_atoms=[6, 1]

nmr_slopelist of float, default=[-1.0537, -1.0784]
List containing the slope to apply for the raw NMR shifts calculated with Gaussian. A slope needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_slope=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_interceptlist of float, default=[181.7815, 31.8723]
List containing the intercept to apply for the raw NMR shifts calculated with Gaussian. An intercept needs to be provided for each atom type in the analysis (i.e., for C and H atoms, the nmr_intercept=[-1.0537, -1.0784]). These values can be adjusted using the CHESHIRE repository.

nmr_experimstr, default=None
Filename of a CSV containing the experimental NMR shifts. Two columnds are needed: A) 'atom_idx' should contain the indexes of the atoms to study as seen in GaussView or other molecular visualizers (i.e., the first atom of the coordinates has index 1); B) 'experimental_ppm' should contain the experimental NMR shifts in ppm observed for the atoms.