A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted.
Coevolution between hosts and pathogens generates strong selection pressures to maintain resistance and infectivity, respectively. Genomes of plant pathogens often encode major effect loci for the ability to successfully infect specific host genotypes. Hence, spatial heterogeneity in host genotypes coupled with abiotic factors could lead to locally adapted pathogen populations. However, the genetic basis of local adaptation is poorly understood. Rhynchosporium commune, the pathogen causing barley scald disease, interacts at least partially in a gene-for-gene manner with its host. We analyzed global field populations of 125 R. commune isolates to identify candidate genes for local adaptation. Whole genome sequencing data showed that the pathogen is subdivided into three genetic clusters associated with distinct geographic and climatic regions. Using haplotype-based selection scans applied independently to each genetic cluster, we found strong evidence for selective sweeps throughout the genome. Comparisons of loci under selection among clusters revealed little overlap, suggesting that ecological differences associated with each cluster led to variable selection regimes. The strongest signals of selection were found predominantly in the two clusters composed of isolates from Central Europe and Ethiopia. The strongest selective sweep regions encoded protein functions related to biotic and abiotic stress responses. Selective sweep regions were enriched in genes encoding functions in cellular localization, protein transport activity, and DNA damage responses. In contrast to the prevailing view that a small number of gene-for-gene interactions govern plant pathogen evolution, our analyses suggest that the evolutionary trajectory is largely determined by spatially heterogeneous biotic and abiotic selection pressures.