filter

RMSD_and_E_filter(outmols, selectedcids_initial, cenergy, args, dup_data, dup_data_idx, calc_type)

This filter selects the first compound that it finds with energy an energy difference lower than the threshold with a higher than the threshold rms with respect to the nearest (in energy) accepted compound.

ewin_filter(sorted_all_cids, cenergy, dup_data, dup_data_idx, calc_type, energy_window)

Given a sorted list of Compound Ids and a sorted list of their energies it discards all compound Ids that have an energy higher than the args.ewin_csearch with respect to the lowest one.

Parameters

sorted_all_cidslist

[description]

cenergylist

[description]

argsargparse.args

[description]

dup_datapd.Dataframe

[description]

dup_data_idxpd.Dataframe?

[description]

calc_typestr

A string that points towards the column of the dataframe that should be filled with the number of duplicates. The current choices are: ['rdkit','summ','ani','xtb']

energy_windowfloat

Minimum energy difference with respect to the lowest compound discard a compound.

Returns

list

list of cids accepted

filters(mol, log, molwt_cutoff)

Applies some basic filters (molwt, salts[currently off], weird atom symbols) that only require SMILES data from a compound and returns if the molecule passes the filters or not.

pre_E_filter(sortedcids, cenergy, dup_data, dup_data_idx, calc_type, threshold)

This filter selects the first compound that it finds with energy an energy difference higher or equal to the threshold with respect to the previously admitted compounds. (Thought as filter for rdkit)

Parameters

sortedcidslist or pd.Dataframe?

List of compound Ids.

cenergylist or pd.Dataframe?

list of compound energies

dup_datapd.Dataframe

[description]

dup_data_idxpd.Dataframe?

[description]

calc_typestr

A string that points towards the column of the dataframe that should be filled with the number of duplicates. The current choices are: ['rdkit','summ','ani','xtb']

thresholdfloat

Minimum energy difference to consider two compounds as different. (kcal/mol)

Returns

list

list of accepted compound Ids