Statistical estimators as an alternative to standard deviation in weighted euclidean distance cluster analysis

Dalatu, Paul Inuwa; Habshah Midi

Clustering is basically one of the major sources of primary data mining tools. It makes
researchers understand the natural grouping of attributes in datasets. Clustering is an
unsupervised classification method with the major aim of partitioning, where objects in the
same cluster are similar, and objects which belong to different clusters vary significantly,
with respect to their attributes. However, the classical Standardized Euclidean distance,
which uses standard deviation to down weight maximum points of the ith features on the
distance clusters, has been criticized by many scholars that the method produces outliers,
lack robustness, and has 0% breakdown points. It also has low efficiency in normal
distribution. Therefore, to remedy the problem, we suggest two statistical estimators
which have 50% breakdown points namely the Sn and Qn estimators, with 58% and 82%
efficiency, respectively. The proposed methods evidently outperformed the existing methods
in down weighting the maximum points of the ith features in distance-based clustering
analysis.

Statistical estimators as an alternative to standard deviation in weighted euclidean distance cluster analysis

Affiliations

Abstract