filter
- RMSD_and_E_filter(outmols, selectedcids_initial, cenergy, args, dup_data, dup_data_idx, calc_type)
This filter selects the first compound that it finds with energy an energy difference lower than the threshold with a higher than the threshold rms with respect to the nearest (in energy) accepted compound.
- ewin_filter(sorted_all_cids, cenergy, dup_data, dup_data_idx, calc_type, energy_window)
Given a sorted list of Compound Ids and a sorted list of their energies it discards all compound Ids that have an energy higher than the args.ewin_csearch with respect to the lowest one.
Parameters
- sorted_all_cidslist
[description]
- cenergylist
[description]
- argsargparse.args
[description]
- dup_datapd.Dataframe
[description]
- dup_data_idxpd.Dataframe?
[description]
- calc_typestr
A string that points towards the column of the dataframe that should be filled with the number of duplicates. The current choices are: ['rdkit','summ','ani','xtb']
- energy_windowfloat
Minimum energy difference with respect to the lowest compound discard a compound.
Returns
- list
list of cids accepted
- filters(mol, log, molwt_cutoff)
Applies some basic filters (molwt, salts[currently off], weird atom symbols) that only require SMILES data from a compound and returns if the molecule passes the filters or not.
- pre_E_filter(sortedcids, cenergy, dup_data, dup_data_idx, calc_type, threshold)
This filter selects the first compound that it finds with energy an energy difference higher or equal to the threshold with respect to the previously admitted compounds. (Thought as filter for rdkit)
Parameters
- sortedcidslist or pd.Dataframe?
List of compound Ids.
- cenergylist or pd.Dataframe?
list of compound energies
- dup_datapd.Dataframe
[description]
- dup_data_idxpd.Dataframe?
[description]
- calc_typestr
A string that points towards the column of the dataframe that should be filled with the number of duplicates. The current choices are: ['rdkit','summ','ani','xtb']
- thresholdfloat
Minimum energy difference to consider two compounds as different. (kcal/mol)
Returns
- list
list of accepted compound Ids