This study investigates the effectiveness and efficiency of two topological data analysis (TDA) techniques, the conventional Mapper (CM) and its variant version, the Ball Mapper (BM), in analyzing the behavior of six major air pollutants (NO2, PM10, PM2.5, O3, CO, and SO2) across 60 air quality monitoring stations in Malaysia. Topological graphs produced by CM and BM reveal redundant monitoring stations and geographical relationships corresponding to air pollutant behavior, providing better visualization than traditional hierarchical clustering. Additionally, a comparative analysis of topological graph structures was conducted using node degree distribution, topological graph indices, and Dynamic Time Warping (DTW) to evaluate the sensitivity and performance of these TDA techniques. Both approaches yielded valuable insights in representing the air quality monitoring stations network; however, the complexity of CM, which requires multiple parameters, poses a challenge in graph construction. In contrast, the simplicity of BM, requiring only a single parameter, is preferable for representing air pollutant behavior. The findings suggest an alternative approach for assessing and identifying redundancies in air quality monitoring stations, which could contribute to improved air quality monitoring management and more effective regulatory policies.
Haze has been a major issue afflicting Southeast Asian countries, including Malaysia, for the past few decades. Hierarchical agglomerative cluster analysis (HACA) is commonly used to evaluate the spatial behavior between areas in which pollutants interact. Typically, using HACA, the Euclidean distance acts as the dissimilarity measure and air quality monitoring stations are grouped according to this measure, thus revealing the most polluted areas. In this study, a framework for the hybridization of the HACA technique is proposed by considering the topological similarity (Wasserstein distance) between stations to evaluate the spatial patterns of the affected areas by haze episodes. For this, a tool in the topological data analysis (TDA), namely, persistent homology, is used to extract essential topological features hidden in the dataset. The performance of the proposed method is compared with that of traditional HACA and evaluated based on its ability to categorize areas according to the exceedance level of the particulate matter (PM10). Results show that additional topological features have yielded better accuracy compared to without the case that does not consider topological features. The cluster validity indices are computed to verify the results, and the proposed method outperforms the traditional method, suggesting a practical alternative approach for assessing the similarity in air pollution behaviors based on topological characterizations.