Heterogeneous parallel architecture (HPA) are inherently more complicated than their homogeneous counterpart. HPAs allow composition of conventional processors, with specialised processors that target particular types of task. However, this makes mapping and scheduling even more complicated and difficult in parallel applications. Therefore, it is crucial to use a robust modelling approach that can capture all the critical characteristics of the application and facilitate the achieving of optimal mapping. In this study, we perform a concise theoretical analysis as well as a comparison of the existing modelling approaches of parallel applications. The theoretical perspective includes both formal concepts and mathematical definitions based on existing scholarly literature. The important characteristics, success factors and challenges of these modelling approaches have been compared and categorised. The results of the theoretical analysis and comparisons show that the existing modelling approaches still need improvement in parallel application modelling in many aspects such as covered metrics and heterogeneity of processors and networks. Moreover, the results assist us to introduce a new approach, which improves the quality of mapping by taking heterogeneity in action and covering more metrics that help to justify the results in a more accurate way.
Tawaf ritual performed during Hajj and Umrah is one of the most unique, large-scale multi-cultural events in this modern day and age. Pilgrims from all over the world circumambulate around a stone cube structure called Ka’aba. Disasters at these types of events are inevitable due to erratic behaviours of pilgrims. This has prompted researchers to present several solutions to avoid such incidents. Agentbased simulations of a large number of pilgrims performing different the ritual can provide the solution to obviate such disasters that are either caused by mismanagement or because of irregular event plans. However, the problem arises due to limited parallelisation capabilities in existing models for concurrent execution of the agent-based simulation. This limitation decreases the efficiency by producing insufficient frames for simulating a large number of autonomous agents during Tawaf ritual. Therefore, it has become very necessary to provide a parallel simulation model that will improve the performance of pilgrims performing the crucial ritual of Tawaf in large numbers. To fill in this gap between large-scale agentbased simulation and navigational behaviours for pilgrim movement, an optimised parallel simulation software of agent-based crowd movement during the ritual of Tawaf is proposed here. The software comprises parallel behaviours for autonomous agents that utilise the inherent parallelism of Graphics Processing Units (GPU). In order to implement the simulation software, an optimized parallel model is proposed. This model is based on the agent-based architecture which comprises agents having a reactive design that responds to a fixed set of stimuli. An advantage of using agents is to provide artificial anomaly to generate heterogeneous movement of the crowd as opposed to a singular movement which is unrealistic. The purpose is to decrease the execution time of complex behaviour computation for each agent while simulating a large crowd of pilgrims at increased frames per second (fps). The implementation utilises CUDA (Compute Unified Device Architecture) platform for general purpose computing over GPU. It exploits the underlying data parallel capability of an existing library for steering behaviours, called OpenSteer. It has simpler behaviours that when combined together, produces more complex realistic behaviours. The data-independent nature of these agent-based behaviours makes it a very suitable candidate to be parallelised. After an in-depth review of previous studies on the simulation of Tawaf ritual, two key behaviours associated with pilgrim movement are considered for the new model. The parallel simulation is executed on three different high-performance configurations to determine the variation in different performance metrics. The parallel implementation achieved a considerable speedup in comparison to its sequential counterpart running on a single-threaded CPU. With the use of parallel behaviours, 100,000 pilgrims at 10 fps were simulated.