QDESCP (descriptor generation)

Overview

The QDESCP module implemented in AQME enables the automated calculation of molecular and atom-centered descriptors from either SMILES strings or three-dimensional structures. In the current implementation, descriptor generation is performed through MORFEUS, which acts as a unified interface to obtain electronic and steric descriptors.

Electronic descriptors are extracted from xTB calculations, preferentially using the PTB method when available and GFN2-xTB for descriptors not supported at the PTB level. In parallel, steric descriptors are computed directly from the GFN2-xTB-optimized geometries.

The overall workflow is summarized in this figure:

QDESCP_scheme

Workflow Description

1. Conformational search (if starting from SMILES)

When the starting point is a CSV file containing SMILES strings, conformational sampling is first performed using the CSEARCH workflow in AQME, typically employing RDKit (default) and optionally CREST.

The generated conformers are filtered using:

  • An energy window (ΔE ≤ 5 kcal mol⁻¹)

  • RMSD-based filtering

  • Butina clustering

This process retains a structurally diverse subset of low-energy conformers (up to five conformers per molecule), which are stored as SDF files.

If three-dimensional structures are provided directly (e.g., XYZ or SDF files), the conformational search step is skipped, and the structures are used directly for descriptor generation.

2. Geometry optimization with GFN2-xTB

The resulting conformers are subsequently optimized at the GFN2-xTB level. This option can be skipped using xtb_opt=False (or --xtb_opt False in command line).

3. Descriptor generation

Through MORFEUS, both molecular and atom-centered descriptors are obtained from the optimized geometries. Descriptors are first computed for each conformer and subsequently Boltzmann-weighted to obtain a single descriptor set per molecule.

Applicability

The workflow is applicable to both organic and organometallic systems, provided that the underlying xTB calculations converge reliably.

Warning

As with other semiempirical approaches, the quality of the resulting descriptors depends on the robustness of the electronic structure calculations. Systems involving:

  • Multireference character

  • Excited states (i.e., S₁, S₂, T₁...)

  • Aggregates or noncovalent complexes

should be interpreted with caution.

Output

The workflow produces CSV files containing Boltzmann-weighted descriptors corresponding to different descriptor subsets:

  • interpret (default)

  • denovo (reduced set)

  • full (extended set including RDKit features)

Descriptor Table

Note

All citations for the descriptors and formulae can be found in our publication describing the descriptor‑generation workflow.

The final descriptor set combines 21 molecular descriptors and 18 atomic descriptors, including:

  • xTB-based electronic descriptors (PTB method when available, if not GFN2-xTB)

  • MORFEUS steric descriptors

Type

Full name

Descriptor

Definition / equation

Units

Method

Molecular

Highest occupied molecular orbital energy

HOMO

\(E_{\mathrm{HOMO}}\)

eV

PTB

Molecular

Lowest unoccupied molecular orbital energy

LUMO

\(E_{\mathrm{LUMO}}\)

eV

PTB

Molecular

HOMO–LUMO gap

HOMO–LUMO gap

\(E_{\text{gap}} = E_{\mathrm{LUMO}} - E_{\mathrm{HOMO}}\)

eV

PTB

Molecular

Ionization potential

IP

\(\mathrm{IP} = E(N-1) - E(N)\)

eV

GFN2

Molecular

Electron affinity

EA

\(\mathrm{EA} = E(N) - E(N+1)\)

eV

GFN2

Molecular

Molecular dipole moment magnitude

Dipole module

\(\text{-}\)

debye

PTB

Molecular

Solvent-accessible surface area

SASA

\(\mathrm{SASA} = \sum_i A_i\)

Ų

MORFEUS

Molecular

Dispersion surface area

Dispersion area

\(\text{-}\)

Ų

MORFEUS

Molecular

Dispersion volume

Dispersion volume

\(\text{-}\)

ų

MORFEUS

Molecular

Solvation free energy in water

G solv. in H₂O

\(\Delta G_{\text{solv}}\)

kcal/mol

GFN2

Molecular

Hydrogen-bond contribution to solvation

G of H-bonds H₂O

\(\Delta G_{\text{HB}}\)

kcal/mol

GFN2

Molecular

Fermi level

Fermi-level

\(E_F = \dfrac{E_{\mathrm{HOMO}} + E_{\mathrm{LUMO}}}{2}\)

eV

GFN2

Molecular

Molecular polarizability

Polarizability

\(\text{-}\)

a₀³

GFN2

Molecular

Chemical hardness

Hardness

\(\eta = \mathrm{IP} - \mathrm{EA}\)

eV

GFN2

Molecular

Chemical softness

Softness

\(S = \dfrac{1}{\eta}\)

eV⁻¹

GFN2

Molecular

Chemical potential

Chem. potential

\(\mu = -\dfrac{\mathrm{IP} + \mathrm{EA}}{2}\)

eV

GFN2

Molecular

Electrophilicity index

Electrophilicity

\(\omega = \dfrac{(\mathrm{IP}+\mathrm{EA})^2}{8(\mathrm{IP}-\mathrm{EA})} = \dfrac{\mu^2}{2\eta}\)

eV

GFN2

Molecular

Electrofugality

Electrofugality

\(\nu_{\text{electrofugality}} = \dfrac{(3\mathrm{IP}-\mathrm{EA})^2}{8(\mathrm{IP}-\mathrm{EA})} = \mathrm{IP} + \omega\)

eV

GFN2

Molecular

Nucleofugality

Nucleofugality

\(\nu_{\text{nucleofugality}} = \dfrac{(\mathrm{IP}-3\mathrm{EA})^2}{8(\mathrm{IP}-\mathrm{EA})} = -\mathrm{EA} + \omega\)

eV

GFN2

Molecular

Fractional occupation density

Total FOD

\(\text{-}\)

e

GFN2

Molecular

Singlet–triplet energy gap

S₀–T₁ gap

\(\Delta E_{S0\text{-}T1} = E_{T1} - E_{S0}\)

kcal/mol

GFN2

Atomic

Atomic hydrogen-bond contribution to solvation

H-bond H₂O

\(\Delta G_{\text{HB},i}\)

kcal/mol

GFN2

Atomic

Mulliken partial charge

Partial charge

\(q_i\)

e

PTB

Atomic

Atomic dipole moment magnitude

Dipole moment

\(\text{-}\)

debye

PTB

Atomic

Atom solvent accessible surface area

Atom SASA

\(A_i\)

Ų

MORFEUS

Atomic

Atomic dispersion descriptor

Atom dispersion

\(\text{-}\)

Ų

MORFEUS

Atomic

Percent buried volume

Buried volume

\(\text{-}\)

%

MORFEUS

Atomic

Pyramidalization parameter

Pyramidalization

\(P = \sin(\theta)·\cos(\alpha)\)

\(\text{-}\)

MORFEUS

Atomic

Pyramidalization angle

Pyramidaliz. volume

\(P = \sqrt{360^\circ - \sum_i \theta_i}\)

°

MORFEUS

Atomic

Fukui nucleophilic index

Fukui+

\(f^+ = q_N - q_{N+1}\)

\(\text{-}\)

GFN2

Atomic

Fukui electrophilic index

Fukui−

\(f^- = q_{N-1} - q_N\)

\(\text{-}\)

GFN2

Atomic

Radical Fukui index

Fukui_rad

\(f_{\mathrm{rad}} = \dfrac{q_{N-1} - q_{N+1}}{2}\)

\(\text{-}\)

GFN2

Atomic

Dual Fukui descriptor

Fukui dual

\(f^{(2)} = f^+ - f^-\)

\(\text{-}\)

GFN2

Atomic

Local electrophilicity

Electrophil.

\(l_\omega = -\dfrac{\mu}{\eta} f + \tfrac{1}{2}\dfrac{\mu}{\eta^2} f^{(2)}\)

\(\text{-}\)

GFN2

Atomic

Normalized electrophilicity

Normaliz. electrophil.

\(\omega_i = \omega\ · f_i^+\)

eV

GFN2

Atomic

Normalized nucleophilicity

Normaliz. nucleophil.

\(N_i = -\mathrm{IP}\ · f_i^-\)

eV

GFN2

Atomic

Atomic polarizability

Atom Polarizability

\(\text{-}\)

a₀³

GFN2

Atomic

Atomic FOD population

Atom FOD

\(\text{-}\)

e

GFN2

Atomic

Coordination number

Coord. numbers

\(\text{-}\)

\(\text{-}\)

GFN2