RESULTS: The Condorcet fusion method was examined. This approach combines the outputs of similarity searches from eleven association and distance similarity coefficients, and then the winner measure for each class of molecules, based on Condorcet fusion, was chosen to be the best method of searching. The recall of retrieved active molecules at top 5% and significant test are used to evaluate our proposed method. The MDL drug data report (MDDR), maximum unbiased validation (MUV) and Directory of Useful Decoys (DUD) data sets were used for experiments and were represented by 2D fingerprints.
CONCLUSIONS: Simulated virtual screening experiments with the standard two data sets show that the use of Condorcet fusion provides a very simple way of improving the ligand-based virtual screening, especially when the active molecules being sought have a lowest degree of structural heterogeneity. However, the effectiveness of the Condorcet fusion was increased slightly when structural sets of high diversity activities were being sought.
RESULTS: The cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate the clusterings and the results were compared to the Ward's clustering method. The MDL Drug Data Report (MDDR) dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward's method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward's method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria.
CONCLUSIONS: The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA) was the method of choice among consensus clustering methods.