  1. Reeki Emrizal, Nor Azlan Nor Muhammad
    Sains Malaysiana, 2018;47:2941-2950.
    Porphyromonas gingivalis is the bacterium responsible for chronic periodontitis, a severe periodontal disease. Virulence
    factors produced by this bacterium are secreted by the Type IX Secretion System (T9SS). The specific functions for
    each protein component of the T9SS have yet to be characterized thus limiting our understanding of the mechanisms
    associated with the translocation and modification processes of the T9SS. This study aims to identify the sequence motifs
    for each T9SS component and predict the functions associated with each discovered motif using motif comparisons. We
    extracted the sequences of 20 T9SS components from the P. gingivalis proteome that were experimentally identified to
    be important for T9SS function and used them for homology searching against fully sequenced bacterial proteomes.
    We developed a rigorous pipeline for the identification of seed sequences for each protein family of T9SS components.
    We verified that each selected seed sequence are true members of the protein family hence sharing conserved sequence
    motifs using profile Hidden Markov Models. The motifs for each T9SS component are identified and compared to motifs
    in the Pfam database. The discovered motifs for 11 components with known functions matched the motifs associated
    with the reported functions. We also suggested the putative functions for four components. PorM and PorW might form
    the putative energy transduction complex. PorP and PorT might be the putative O-deacylases. The identified motifs for
    five components matched the motifs associated with functions that related/unrelated to the T9SS.
  2. Pritchard LI, Daniels PW, Melville LF, Kirkland PD, Johnson SJ, Lunt R, et al.
    Vet. Ital., 2004 Oct-Dec;40(4):438-45.
    PMID: 20422566
    The authors have characterised the genetic diversity of the bluetongue virus (BTV) RNA segments 3 and 10 from Indonesia, Malaysia and Australia. Analysis of RNA segment 3, which codes for the core protein VP3, showed conserved sequences in the previously defined Australasian topotype, but which further divided into four distinct clades or genotypes. Certain genotypes appeared to be geographically restricted while others were distributed widely throughout South-East Asia. Ongoing surveillance programmes in Australia have identified the movement of Indonesian genotypes into northern Australia and possible reassortment among them. Similarly, analysis of RNA segment 10, which codes for the non-structural protein NS3/3A, showed they were also conserved and grouped into five clades or genotypes, three Asian and two North American/South African.
  3. Ithnin M, Teh CK, Ratnam W
    BMC Genet, 2017 04 19;18(1):37.
    PMID: 28420332 DOI: 10.1186/s12863-017-0505-7
    BACKGROUND: The Elaeis oleifera genetic materials were assembled from its center of diversity in South and Central America. These materials are currently being preserved in Malaysia as ex situ living collections. Maintaining such collections is expensive and requires sizable land. Information on the genetic diversity of these collections can help achieve efficient conservation via maintenance of core collection. For this purpose, we have applied fourteen unlinked microsatellite markers to evaluate 532 E. oleifera palms representing 19 populations distributed across Honduras, Costa Rica, Panama and Colombia.

    RESULTS: In general, the genetic diversity decreased from Costa Rica towards the north (Honduras) and south-east (Colombia). Principle coordinate analysis (PCoA) showed a single cluster indicating low divergence among palms. The phylogenetic tree and STRUCTURE analysis revealed clusters based on country of origin, indicating considerable gene flow among populations within countries. Based on the values of the genetic diversity parameters, some genetically diverse populations could be identified. Further, a total of 34 individual palms that collectively captured maximum allelic diversity with reduced redundancy were also identified. High pairwise genetic differentiation (Fst > 0.250) among populations was evident, particularly between the Colombian populations and those from Honduras, Panama and Costa Rica. Crossing selected palms from highly differentiated populations could generate off-springs that retain more genetic diversity.

    CONCLUSION: The results attained are useful for selecting palms and populations for core collection. The selected materials can also be included into crossing scheme to generate offsprings that capture greater genetic diversity for selection gain in the future.

  4. Appasamy SD, Ramlan EI, Firdaus-Raih M
    PLoS One, 2013;8(9):e73984.
    PMID: 24040136 DOI: 10.1371/journal.pone.0073984
    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand's functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch.
  5. Kushwaha SK, Bhavesh NLS, Abdella B, Lahiri C, Marathe SA
    Sci Rep, 2020 12 03;10(1):21156.
    PMID: 33273523 DOI: 10.1038/s41598-020-77890-6
    Salmonellae display intricate evolutionary patterns comprising over 2500 serovars having diverse pathogenic profiles. The acquisition and/or exchange of various virulence factors influences the evolutionary framework. To gain insights into evolution of Salmonella in association with the CRISPR-Cas genes we performed phylogenetic surveillance across strains of 22 Salmonella serovars. The strains differed in their CRISPR1-leader and cas operon features assorting into two main clades, CRISPR1-STY/cas-STY and CRISPR1-STM/cas-STM, comprising majorly typhoidal and non-typhoidal Salmonella serovars respectively. Serovars of these two clades displayed better relatedness, concerning CRISPR1-leader and cas operon, across genera than between themselves. This signifies the acquisition of CRISPR1/Cas region could be through a horizontal gene transfer event owing to the presence of mobile genetic elements flanking CRISPR1 array. Comparison of CRISPR and cas phenograms with that of multilocus sequence typing (MLST) suggests differential evolution of CRISPR/Cas system. As opposed to broad-host-range, the host-specific serovars harbor fewer spacers. Mapping of protospacer sources suggested a partial correlation of spacer content with habitat diversity of the serovars. Some serovars like serovar Enteritidis and Typhimurium that inhabit similar environment/infect similar hosts hardly shared their protospacer sources.
  6. Mat-Sharani S, Firdaus-Raih M
    BMC Bioinformatics, 2019 Feb 04;19(Suppl 13):551.
    PMID: 30717662 DOI: 10.1186/s12859-018-2550-2
    BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.

    RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.

    CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

  7. Loong SK, Tan KK, Zulkifle NI, AbuBakar S
    Data Brief, 2019 Aug;25:104159.
    PMID: 31312701 DOI: 10.1016/j.dib.2019.104159
    Paraburkholderia fungorum is an opportunistic bacteria infrequently associated with human infections. Here, we report the draft genome sequence of P. fungorum strain BF370, recovered from the synovial tissue of a patient in Malaysia. The P. fungorum genome contains a 8,950,957 bp chromosome with a G+C content of 61.8%. Colicin and heavy metal resistant genes were also present in the genome. Conserved sequence indels unique to P. fungorum were observed in the genome. The draft genome was deposited at the European Nucleotide Archive under the sample accession number ERS1776561 and study accession number PRJEB17921.
  8. Chung HH, Lim LWK, Liao Y, Lam TT, Chong YL
    Trop Life Sci Res, 2020 Apr;31(1):107-121.
    PMID: 32963714 DOI: 10.21315/tlsr2020.31.1.7
    The Trigonopoma pauciperforatum or the redstripe rasbora is a cyprinid commonly found in marshes and swampy areas with slight acidic tannin-stained water in the tropics. In this study, the complete mitogenome sequence of T. pauciperforatum was first amplified in two parts using two pairs of overlapping primers and then sequenced. The size of the mitogenome is 16,707 bp, encompassing 22 transfer RNA genes, 13 protein-coding genes, two ribosomal RNA genes and a putative control region. Identical gene organisation was detected between this species and other family members. The heavy strand accommodates 28 genes while the light strand houses the remaining nine genes. Most protein-coding genes utilise ATG as start codon except for COI gene which uses GTG instead. The terminal associated sequence (TAS), central conserved sequence block (CSB-F, CSB-D and CSB-E) as well as variable sequence block (CSB-1, CSB-2 and CSB-3) are conserved in the control region. The maximum likelihood phylogenetic tree revealed the divergence of T. pauciperforatum from the basal region of the major clade, where its evolutionary relationships with Boraras maculatus, Rasbora cephalotaenia and R. daniconius are poorly resolved as suggested by the low bootstrap values. This work contributes towards the genetic resource enrichment for peat swamp conservation and comprehensive in-depth comparisons across other phylogenetic researches done on the Rasbora-related genus.
  9. Khan AM, Hu Y, Miotto O, Thevasagayam NM, Sukumaran R, Abd Raman HS, et al.
    BMC Med Genomics, 2017 12 21;10(Suppl 4):78.
    PMID: 29322922 DOI: 10.1186/s12920-017-0301-2
    BACKGROUND: Viral vaccine target discovery requires understanding the diversity of both the virus and the human immune system. The readily available and rapidly growing pool of viral sequence data in the public domain enable the identification and characterization of immune targets relevant to adaptive immunity. A systematic bioinformatics approach is necessary to facilitate the analysis of such large datasets for selection of potential candidate vaccine targets.

    RESULTS: This work describes a computational methodology to achieve this analysis, with data of dengue, West Nile, hepatitis A, HIV-1, and influenza A viruses as examples. Our methodology has been implemented as an analytical pipeline that brings significant advancement to the field of reverse vaccinology, enabling systematic screening of known sequence data in nature for identification of vaccine targets. This includes key steps (i) comprehensive and extensive collection of sequence data of viral proteomes (the virome), (ii) data cleaning, (iii) large-scale sequence alignments, (iv) peptide entropy analysis, (v) intra- and inter-species variation analysis of conserved sequences, including human homology analysis, and (vi) functional and immunological relevance analysis.

    CONCLUSION: These steps are combined into the pipeline ensuring that a more refined process, as compared to a simple evolutionary conservation analysis, will facilitate a better selection of vaccine targets and their prioritization for subsequent experimental validation.

  10. Zhang KJ, Liu L, Rong X, Zhang GH, Liu H, Liu YH
    Mitochondrial DNA A DNA Mapp Seq Anal, 2016 11;27(6):4314-4315.
    PMID: 26462416
    We sequenced and annotated the complete mitochondrial genome (mitogenome) of Bactrocera diaphora (Diptera: Tephtitidae), which is an economically important pest in the southwest area of China, India, Sri Lanka, Vietnam and Malaysia. This mitogenome is 15 890 bp in length with an A + T content of 74.103%, and contains 37 typical animal mitochondrial genes that are arranged in the same order as that of the inferred ancestral insects. All protein-coding genes (PCGs) start with a typical ATN codon, except cox1 that begins with TCG. Ten PCGs stop with termination codon TAA or TAG, whereas cox1, nad1 and nad5 have single T-- as the incomplete stop codon. All of the transfer RNA genes present the typical clover leaf secondary structure except trnS1 (AGN) with a looping D-arm. The A + T-rich region is located between rrnS and trnI with a length of 946 bp, and contains a 20 bp poly-T stretch and 22 bp poly-A stretch. Except the control region, the longest intergenic spacer is located between trnR and trnN that is 94 bp long with an excessive high A + T content (95.74%) and a microsatellite-like region (TA)13.
  11. Wong CM, Tam HK, Ng WM, Boo SY, González M
    Plasmid, 2013 Mar;69(2):186-93.
    PMID: 23266397 DOI: 10.1016/j.plasmid.2012.12.002
    A cryptic plasmid, pMWHK1 recovered from an Antarctic bacterium Pedobacter cryoconitis BG5 was sequenced and characterised. The plasmid is a circular 6206bp molecule with eight putative open reading frames designated as orf1, orf2, orf3, orf4, orf5, orf6, orf7 and orf8. All the putative open reading frames of pMWHK1 are found to be actively transcribed. Proteins encoded by orf2 and orf4 are predicted to be responsible for the mobilization and replication of the plasmid respectively. orf4 shares 55% and 61% identities with the theta-type Rep proteins from two strains of Riemerella anatipestifer. This suggests that pMWHK1 could be a member of the theta-type replicating plasmid. The origin of replication is located within the AT-rich region upstream of orf4. orf5 and orf6 encode bacterial toxin-antitoxin proteins predicted to maintain plasmid stability. orf3 encodes an entry exclusion protein that is hypothetically involved in reducing the frequency of DNA transfer through conjugation. orf1, orf7 and orf8 encode proteins with unknown functions. Plasmid, pMWHK1 is stably maintained in P. cryoconitis BG5 at 20°C.
  12. Mohamed Yusoff AA, Mohd Khair SZN, Wan Abdullah WS, Abd Radzak SM, Abdullah JM
    J Cancer Res Ther, 2020 12 22;16(6):1517-1521.
    PMID: 33342822 DOI: 10.4103/jcrt.JCRT_1132_16
    Background and Objective: Meningiomas are among the most common intracranial tumors of the central nervous system. It is widely accepted that the initiation and progression of meningiomas involve the accumulation of nucleus genetic alterations, but little is known about the implication of mitochondrial genomic alterations during development of these tumors. The human mitochondrial DNA (mtDNA) contains a short hypervariable, noncoding displacement loop control region known as the D-Loop. Alterations in the mtDNA D-loop have been reported to occur in most types of human cancers. The purpose of this study was to assess the mtDNA D-loop mutations in Malaysian meningioma patients.

    Materials and Methods: Genomic DNA was extracted from 21 fresh-frozen tumor tissues and blood samples of the same meningioma patients. The entire mtDNA D-loop region (positions 16024-576) was polymerase chain reaction amplified using designed primers, and then amplification products were purified before the direct DNA sequencing proceeds.

    Results: Overall, 10 (47.6%) patients were detected to harbor a total of 27 somatic mtDNA D-loop mutations. Most of these mtDNA mutations were identified in the hypervariable segment II (40.7%), with 33.3% being located mainly in the conserved sequence block II of the D310 sequence. Furthermore, 58 different germline variations were observed at 21 nucleotide positions.

    Conclusion: Our results suggest that mtDNA alterations in the D-loop region may be an important and early event in developing meningioma. Further studies are needed, including validation in a larger patient cohort, to verify the clinicopathological outcomes of mtDNA mutation biomarkers in meningiomas.

  13. Masomian M, Rahman RN, Salleh AB, Basri M
    PLoS One, 2016;11(3):e0149851.
    PMID: 26934700 DOI: 10.1371/journal.pone.0149851
    Thermostable and organic solvent-tolerant enzymes have significant potential in a wide range of synthetic reactions in industry due to their inherent stability at high temperatures and their ability to endure harsh organic solvents. In this study, a novel gene encoding a true lipase was isolated by construction of a genomic DNA library of thermophilic Aneurinibacillus thermoaerophilus strain HZ into Escherichia coli plasmid vector. Sequence analysis revealed that HZ lipase had 62% identity to putative lipase from Bacillus pseudomycoides. The closely characterized lipases to the HZ lipase gene are from thermostable Bacillus and Geobacillus lipases belonging to the subfamily I.5 with ≤ 57% identity. The amino acid sequence analysis of HZ lipase determined a conserved pentapeptide containing the active serine, GHSMG and a Ca(2+)-binding motif, GCYGSD in the enzyme. Protein structure modeling showed that HZ lipase consisted of an α/β hydrolase fold and a lid domain. Protein sequence alignment, conserved regions analysis, clustal distance matrix and amino acid composition illustrated differences between HZ lipase and other thermostable lipases. Phylogenetic analysis revealed that this lipase represented a new subfamily of family I of bacterial true lipases, classified as family I.9. The HZ lipase was expressed under promoter Plac using IPTG and was characterized. The recombinant enzyme showed optimal activity at 65 °C and retained ≥ 97% activity after incubation at 50 °C for 1h. The HZ lipase was stable in various polar and non-polar organic solvents.
  14. Saleh MA, Solayman M, Paul S, Saha M, Khalil MI, Gan SH
    Biomed Res Int, 2016;2016:9142190.
    PMID: 27294143 DOI: 10.1155/2016/9142190
    Despite the reported association of adiponectin receptor 1 (ADIPOR1) gene mutations with vulnerability to several human metabolic diseases, there is lack of computational analysis on the functional and structural impacts of single nucleotide polymorphisms (SNPs) of the human ADIPOR1 at protein level. Therefore, sequence- and structure-based computational tools were employed in this study to functionally and structurally characterize the coding nsSNPs of ADIPOR1 gene listed in the dbSNP database. Our in silico analysis by SIFT, nsSNPAnalyzer, PolyPhen-2, Fathmm, I-Mutant 2.0, SNPs&GO, PhD-SNP, PANTHER, and SNPeffect tools identified the nsSNPs with distorting functional impacts, namely, rs765425383 (A348G), rs752071352 (H341Y), rs759555652 (R324L), rs200326086 (L224F), and rs766267373 (L143P) from 74 nsSNPs of ADIPOR1 gene. Finally the aforementioned five deleterious nsSNPs were introduced using Swiss-PDB Viewer package within the X-ray crystal structure of ADIPOR1 protein, and changes in free energy for these mutations were computed. Although increased free energy was observed for all the mutants, the nsSNP H341Y caused the highest energy increase amongst all. RMSD and TM scores predicted that mutants were structurally similar to wild type protein. Our analyses suggested that the aforementioned variants especially H341Y could directly or indirectly destabilize the amino acid interactions and hydrogen bonding networks of ADIPOR1.
  15. Chong LC, Khan AM
    BMC Genomics, 2019 Dec 24;20(Suppl 9):921.
    PMID: 31874646 DOI: 10.1186/s12864-019-6311-z
    BACKGROUND: The sequence diversity of dengue virus (DENV) is one of the challenges in developing an effective vaccine against the virus. Highly conserved, serotype-specific (HCSS), immune-relevant DENV sequences are attractive candidates for vaccine design, and represent an alternative to the approach of selecting pan-DENV conserved sequences. The former aims to limit the number of possible cross-reactive epitope variants in the population, while the latter aims to limit the cross-reactivity between the serotypes to favour a serotype-specific response. Herein, we performed a large-scale systematic study to map and characterise HCSS sequences in the DENV proteome.

    METHODS: All reported DENV protein sequence data for each serotype was retrieved from the NCBI Entrez Protein (nr) Database (txid: 12637). The downloaded sequences were then separated according to the individual serotype proteins by use of BLASTp search, and subsequently removed for duplicates and co-aligned across the serotypes. Shannon's entropy and mutual information (MI) analyses, by use of AVANA, were performed to measure the diversity within and between the serotype proteins to identify HCSS nonamers. The sequences were evaluated for the presence of promiscuous T-cell epitopes by use of NetCTLpan 1.1 and NetMHCIIpan 3.2 server for human leukocyte antigen (HLA) class I and class II supertypes, respectively. The predicted epitopes were matched to reported epitopes in the Immune Epitope Database.

    RESULTS: A total of 2321 nonamers met the HCSS selection criteria of entropy  0.8. Concatenating these resulted in a total of 337 HCSS sequences. DENV4 had the most number of HCSS nonamers; NS5, NS3 and E proteins had among the highest, with none in the C and only one in prM. The HCSS sequences were immune-relevant; 87 HCSS sequences were both reported T-cell epitopes/ligands in human and predicted epitopes, supporting the accuracy of the predictions. A number of the HCSS clustered as immunological hotspots and exhibited putative promiscuity beyond a single HLA supertype. The HCSS sequences represented, on average, ~ 40% of the proteome length for each serotype; more than double of pan-DENV sequences (conserved across the four serotypes), and thus offer a larger choice of sequences for vaccine target selection. HCSS sequences of a given serotype showed significant amino acid difference to all the variants of the other serotypes, supporting the notion of serotype-specificity.

    CONCLUSION: This work provides a catalogue of HCSS sequences in the DENV proteome, as candidates for vaccine target selection. The methodology described herein provides a framework for similar application to other pathogens.

  16. SahBandar IN, Takahashi K, Motomura K, Djoerban Z, Firmansyah I, Kitamura K, et al.
    AIDS Res Hum Retroviruses, 2011 Jan;27(1):97-102.
    PMID: 20958201 DOI: 10.1089/aid.2010.0163
    Cocirculation of subtype B and CRF01_AE in Southeast Asia has led to the establishment of new recombinant forms. In our previous study, we found five samples suspected of being recombinants between subtype B and CRF01_AE, and here, we analyzed near full-length sequences of two samples and compared them to known CRFs_01B, subtype B, and CRF01_AE. Five overlapped segments were amplified with nested PCR from PBMC DNA, sequenced, and analyzed for genome mosaicism. The two Indonesian samples, 07IDJKT189 and 07IDJKT194, showed genome-mosaic patterns similar to CRF33_01B references from Malaysia, with one short segment in the 3' end of the p31 integrase-coding region, which was rather more similar to subtype B than CRF01_AE, consisting of unclassified sequences. These results suggest gene-specific continuous diversification and spread of the CRF33_01B genomes in Southeast Asia.
  17. Ung CY, Teoh TC
    J Biosci, 2014 Jun;39(3):493-504.
    PMID: 24845512
    DARPP-32 (dopamine and adenosine 3', 5'-monophosphate-regulated phosphoprotein of 32 kDa), which belongs to PPP1R1 gene family, is known to act as an important integrator in dopamine-mediated neurotransmission via the inhibition of protein phosphatase-1 (PP1). Besides its neuronal roles, this protein also behaves as a key player in pathological and pharmacological aspects. Use of bioinformatics and phylogenetics approaches to further characterize the molecular features of DARPP-32 can guide future works. Predicted phosphorylation sites on DARPP-32 show conservation across vertebrates. Phylogenetics analysis indicates evolutionary strata of phosphorylation site acquisition at the C-terminus, suggesting functional expansion of DARPP-32, where more diverse signalling cues may involve in regulating DARPP-32 in inhibiting PP1 activity. Moreover, both phylogenetics and synteny analyses suggest de novo origination of PPP1R1 gene family via chromosomal rearrangement and exonization.
  18. Adamu A, Shamsir MS, Wahab RA, Parvizpour S, Huyop F
    J Biomol Struct Dyn, 2017 Nov;35(15):3285-3296.
    PMID: 27800712 DOI: 10.1080/07391102.2016.1254115
    Dehalogenases are of high interest due to their potential applications in bioremediation and in synthesis of various industrial products. DehL is an L-2-haloacid dehalogenase (EC that catalyses the cleavage of halide ion from L-2-halocarboxylic acid to produce D-2-hydroxycarboxylic acid. Although DehL utilises the same substrates as the other L-2-haloacid dehalogenases, its deduced amino acid sequence is substantially different (<25%) from those of the rest L-2-haloacid dehalogenases. To date, the 3D structure of DehL is not available. This limits the detailed understanding of the enzyme's reaction mechanism. The present work predicted the first homology-based model of DehL and defined its active site. The monomeric unit of the DehL constitutes α/β structure that is organised into two distinct structural domains: main and subdomains. Despite the sequence disparity between the DehL and other L-2-haloacid dehalogenases, its structural model share similar fold as the experimentally solved L-DEX and DehlB structures. The findings of the present work will play a crucial role in elucidating the molecular details of the DehL functional mechanism.
  19. Abd Raman HS, Tan S, August JT, Khan AM
    PeerJ, 2020;7:e7954.
    PMID: 32518710 DOI: 10.7717/peerj.7954
    Background: Influenza A (H5N1) virus is a global concern with potential as a pandemic threat. High sequence variability of influenza A viruses is a major challenge for effective vaccine design. A continuing goal towards this is a greater understanding of influenza A (H5N1) proteome sequence diversity in the context of the immune system (antigenic diversity), the dynamics of mutation, and effective strategies to overcome the diversity for vaccine design.

    Methods: Herein, we report a comprehensive study of the dynamics of H5N1 mutations by analysis of the aligned overlapping nonamer positions (1-9, 2-10, etc.) of more than 13,000 protein sequences of avian and human influenza A (H5N1) viruses, reported over at least 50 years. Entropy calculations were performed on 9,408 overlapping nonamer position of the proteome to study the diversity in the context of immune system. The nonamers represent the predominant length of the binding cores for peptides recognized by the cellular immune system. To further dissect the sequence diversity, each overlapping nonamer position was quantitatively analyzed for four patterns of sequence diversity motifs: index, major, minor and unique.

    Results: Almost all of the aligned overlapping nonamer positions of each viral proteome exhibited variants (major, minor, and unique) to the predominant index sequence. Each variant motif displayed a characteristic pattern of incidence change in relation to increased total variants. The major variant exhibited a restrictive pyramidal incidence pattern, with peak incidence at 50% total variants. Post this peak incidence, the minor variants became the predominant motif for majority of the positions. Unique variants, each sequence observed only once, were present at nearly all of the nonamer positions. The diversity motifs (index and variants) demonstrated complex inter-relationships, with motif switching being a common phenomenon. Additionally, 25 highly conserved sequences were identified to be shared across viruses of both hosts, with half conserved to several other influenza A subtypes.

    Discussion: The presence of distinct sequences (nonatypes) at nearly all nonamer positions represents a large repertoire of reported viral variants in the proteome, which influence the variability dynamics of the viral population. This work elucidated and provided important insights on the components that make up the viral diversity, delineating inherent patterns in the organization of sequence changes that function in the viral fitness-selection. Additionally, it provides a catalogue of all the mutational changes involved in the dynamics of H5N1 viral diversity for both avian and human host populations. This work provides data relevant for the design of prophylactics and therapeutics that overcome the diversity of the virus, and can aid in the surveillance of existing and future strains of influenza viruses.

  20. Yusuf M, Konc J, Sy Bing C, Trykowska Konc J, Ahmad Khairudin NB, Janezic D, et al.
    J Chem Inf Model, 2013 Sep 23;53(9):2423-36.
    PMID: 23980878 DOI: 10.1021/ci400421e
    ProBiS is a new method to identify the binding site of protein through local structural alignment against the nonredundant Protein Data Bank (PDB), which may result in unique findings compared to the energy-based, geometry-based, and sequence-based predictors. In this work, binding sites of Hemagglutinin (HA), which is an important target for drugs and vaccines in influenza treatment, have been revisited by ProBiS. For the first time, the identification of conserved binding sites by local structural alignment across all subtypes and strains of HA available in PDB is presented. ProBiS finds three distinctive conserved sites on HA's structure (named Site 1, Site 2, and Site 3). Compared to other predictors, ProBiS is the only one that accurately defines the receptor binding site (Site 1). Apart from that, Site 2, which is located slightly above the TBHQ binding site, is proposed as a potential novel conserved target for membrane fusion inhibitor. Lastly, Site 3, located around Helix A at the stem domain and recently targeted by cross-reactive antibodies, is predicted to be conserved in the latest H7N9 China 2013 strain as well. The further exploration of these three sites provides valuable insight in optimizing the influenza drug and vaccine development.
