Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact that these clustering methods work satisfactorily on a majority of the EST datasets, they have a common weakness. They are prone to deliver unsatisfactory clustering results when dealing with ESTs from the genes derived from the same family. The root cause is the distance measures applied on them are not sensitive enough to separate these closely related genes.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.