Since it emerged in Japan in the 1870s, Japanese encephalitis has spread across Asia and has become the most important cause of epidemic encephalitis worldwide. Four genotypes of Japanese encephalitis virus (JEV) are presently recognized (representatives of genotypes I to III have been fully sequenced), but its origin is not known. We have determined the complete nucleotide and amino acid sequence of a genotype IV Indonesian isolate (JKT6468) which represents the oldest lineage, compared it with other fully sequenced genomes, and examined the geographical distribution of all known isolates. JKT6468 was the least similar, with nucleotide divergence ranging from 17.4 to 19.6% and amino acid divergence ranging from 4.7 to 6.5%. It included an unusual series of amino acids at the carboxy terminus of the core protein unlike that seen in other JEV strains. Three signature amino acids in the envelope protein (including E327 Leu-->Thr/Ser on the exposed lateral surface of the putative receptor binding domain) distinguished genotype IV strains from more recent genotypes. Analysis of all 290 JEV isolates for which sequence data are available showed that the Indonesia-Malaysia region has all genotypes of JEV circulating, whereas only more recent genotypes circulate in other areas (P < 0.0001). These results suggest that JEV originated from its ancestral virus in the Indonesia-Malaysia region and evolved there into the different genotypes which then spread across Asia. Our data, together with recent evidence on the origins of other emerging viruses, including dengue virus and Nipah virus, imply that tropical southeast Asia may be an important zone for emerging pathogens.
A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism.