Displaying all 18 publications

Abstract:
Sort:
  1. Mohd Khairuddin KA, Ahmad K, Mohd Ibrahim H, Yan Y
    J Voice, 2021 Jul;35(4):636-645.
    PMID: 31864891 DOI: 10.1016/j.jvoice.2019.12.005
    Despite its clear advantages, laryngeal high-speed videoendoscopy (LHSV) has not yet been accepted as a routine imaging tool for the evaluation of vocal fold vibration due to the unavailability of methods to effectively analyze the huge number of images from the LHSV recording. Recently, a promising LHSV-based analysis method has been introduced. The ability of this analysis method in studying the vocal fold vibratory behaviors had been substantially demonstrated. However, some practical aspects of its clinical applications still require further attention. Most fundamental is that the criteria for the measurement input ie, a segment of interest (SOI), which has not been fully defined. Particularly, the length of the SOI and the location along the sample, where it needs to be selected require further confirmation. Meanwhile, the analysis using any options of a well-delineated glottal area demands verification. Without clear criteria for the SOI, it is difficult to demonstrate the relevance of this analysis method in clinical voice assessment. Therefore, the aim of the present study is to establish the criteria for the SOI, which involved the investigations on the length of the SOI and the location along the sample, where it needs to be selected, as well as the use of any options of a well-delineated glottal area for analysis. The participants in the present study consisted of 36 young normophonic females. The methods involved LHSV recording of the images of the vibrating vocal folds. The captured images were then analyzed using the method. The LHSV-based measures from the analyses were compared according to the specified procedures of each investigation. Results indicated that 2000 frames should be used as the SOI length. The SOI could be selected at any location along the sample as long as well-delineated glottal areas were observed. With the current findings, a more conclusive measurement protocol is available to ensure reliable LHSV-based measures. The findings further support this analysis method for clinical application, which in turn promote LHSV as a reliable laryngeal imaging tool in clinical setting.
    Matched MeSH terms: Voice Quality*
  2. Peter S, Abdul Rahman ZA, Pillai S
    Int J Oral Maxillofac Surg, 2019 Oct;48(10):1317-1322.
    PMID: 31014926 DOI: 10.1016/j.ijom.2019.03.896
    The aim of this study was to document differences in hypernasality during speaking and singing among children with cleft palate and to compare nasality score ratings of trained and untrained listeners. Twenty subjects with cleft palate aged between 7 and 12 years participated in this study. Audio recordings were made of the children reading a passage and singing a common local song, both in the Malay language. The degree of hypernasality was judged through perceptual assessment. Three trained listeners (a speech therapist, a classical singer, and a linguistic expert - all academicians) and two untrained listeners (a cleft volunteer worker and a national high school teacher) assessed the recordings using a visual analogue scale (VAS). Inter-rater and intra-rater reliability for hypernasality in both speaking and singing were verified using the intra-class correlation coefficient (ICC). A significant reduction in hypernasality was observed during singing as compared to speaking, indicating that hypernasality reduces when a child with cleft palate sings. The act of singing significantly reduces hypernasality. The outcome of this study suggests that children with cleft palate would benefit from singing exercises to ultimately reduce hypernasality. However, future research is needed to objectively measure nasality in singing compared to speaking.
    Matched MeSH terms: Voice Quality
  3. Ibrahim HM, Reilly S, Kilpatrick N
    Cleft Palate Craniofac J, 2012 Sep;49(5):e61-3.
    PMID: 21787239 DOI: 10.1597/11-001
    To establish normative nasalance scores for a set of newly developed stimuli in Malay.
    Matched MeSH terms: Voice Quality
  4. Za'im NAN, Al-Dhief FT, Azman M, Alsemawi MRM, Abdul Latiff NMA, Mat Baki M
    J Otolaryngol Head Neck Surg, 2023 Sep 20;52(1):62.
    PMID: 37730624 DOI: 10.1186/s40463-023-00661-6
    BACKGROUND: A multidimensional voice quality assessment is recommended for all patients with dysphonia, which requires a patient visit to the otolaryngology clinic. The aim of this study was to determine the accuracy of an online artificial intelligence classifier, the Online Sequential Extreme Learning Machine (OSELM), in detecting voice pathology. In this study, a Malaysian Voice Pathology Database (MVPD), which is the first Malaysian voice database, was created and tested.

    METHODS: The study included 382 participants (252 normal voices and 130 dysphonic voices) in the proposed database MVPD. Complete data were obtained for both groups, including voice samples, laryngostroboscopy videos, and acoustic analysis. The diagnoses of patients with dysphonia were obtained. Each voice sample was anonymized using a code that was specific to each individual and stored in the MVPD. These voice samples were used to train and test the proposed OSELM algorithm. The performance of OSELM was evaluated and compared with other classifiers in terms of the accuracy, sensitivity, and specificity of detecting and differentiating dysphonic voices.

    RESULTS: The accuracy, sensitivity, and specificity of OSELM in detecting normal and dysphonic voices were 90%, 98%, and 73%, respectively. The classifier differentiated between structural and non-structural vocal fold pathology with accuracy, sensitivity, and specificity of 84%, 89%, and 88%, respectively, while it differentiated between malignant and benign lesions with an accuracy, sensitivity, and specificity of 92%, 100%, and 58%, respectively. Compared to other classifiers, OSELM showed superior accuracy and sensitivity in detecting dysphonic voices, differentiating structural versus non-structural vocal fold pathology, and between malignant and benign voice pathology.

    CONCLUSION: The OSELM algorithm exhibited the highest accuracy and sensitivity compared to other classifiers in detecting voice pathology, classifying between malignant and benign lesions, and differentiating between structural and non-structural vocal pathology. Hence, it is a promising artificial intelligence that supports an online application to be used as a screening tool to encourage people to seek medical consultation early for a definitive diagnosis of voice pathology.

    Matched MeSH terms: Voice Quality
  5. Rahman M, Saniasiaya J, Abu Bakar MZ
    J Laryngol Otol, 2023 Jul;137(7):789-793.
    PMID: 36444560 DOI: 10.1017/S0022215122002493
    OBJECTIVE: Teachers and singers have been extensively studied and are shown to have a greater tendency to voice disorders. This study aimed to investigate the correlation between subjective and objective voice analysis pre- and post-shift among teleoperators in a tertiary hospital.

    METHODS: This was a prospective cohort study. Each patient underwent pre- and post-shift voice analysis.

    RESULTS: Among 42 teleoperators, 28 patients (66.7 per cent) completed all the tests. Female predominance (62 per cent) was noted, with a mean age of 40 years. Voice changes during working were reported by 48.1 per cent. Pre- and post-shift maximum phonation time (p < 0.018) and Voice Handicap Index-10 (p < 0.011) showed significant results with no correlation noted between subjective and objective assessment.

    CONCLUSION: Maximum phonation time and Voice Handicap Index-10 are good voice assessment tools. The quality of evidence is inadequate to recommend 'gold standard' voice assessment until a better-quality study has been completed.

    Matched MeSH terms: Voice Quality
  6. Mat Baki M, Wood G, Alston M, Ratcliffe P, Sandhu G, Rubin JS, et al.
    Clin Otolaryngol, 2015 Feb;40(1):22-8.
    PMID: 25263076 DOI: 10.1111/coa.12313
    OBJECTIVE: To evaluate the agreement between OperaVOX and MDVP.

    DESIGN: Cross sectional reliability study.

    SETTING: University teaching hospital.

    METHODS: Fifty healthy volunteers and 50 voice disorder patients had supervised recordings in a quiet room using OperaVOX by the iPod's internal microphone with sampling rate of 45 kHz. A five-seconds recording of vowel/a/was used to measure fundamental frequency (F0), jitter, shimmer and noise-to-harmonic ratio (NHR). All healthy volunteers and 21 patients had a second recording. The recorded voices were also analysed using the MDVP. The inter- and intrasoftware reliability was analysed using intraclass correlation (ICC) test and Bland-Altman (BA) method. Mann-Whitney test was used to compare the acoustic parameters between healthy volunteers and patients.

    RESULTS: Nine of 50 patients had severe aperiodic voice. The ICC was high with a confidence interval of >0.75 for the inter- and intrasoftware reliability except for the NHR. For the intersoftware BA analysis, excluding the severe aperiodic voice data sets, the bias (95% LOA) of F0, jitter, shimmer and NHR was 0.81 (11.32, -9.71); -0.13 (1.26, -1.52); -0.52 (1.68, -2.72); and 0.08 (0.27, -0.10). For the intrasoftware reliability, it was -1.48 (18.43, -21.39); 0.05 (1.31, -1.21); -0.01 (2.87, -2.89); and 0.005 (0.20, -0.18), respectively. Normative data from the healthy volunteers were obtained. There was a significant difference in all acoustic parameters between volunteers and patients measured by the Opera-VOX (P 

    Matched MeSH terms: Voice Quality/physiology*
  7. Ting HN, Zourmand A, Chia SY, Yong BF, Abdul Hamid B
    J Voice, 2012 Sep;26(5):664.e1-6.
    PMID: 22285457 DOI: 10.1016/j.jvoice.2011.08.008
    The formant frequencies of Malaysian Malay children have not been well studied. This article investigates the first four formant frequencies of sustained vowels in 360 Malay children aged between 7 and 12 years using acoustical analysis. Generally, Malay female children had higher formant frequencies than those of their male counterparts. However, no significant differences in all four formant frequencies were observed between the Malay male and female children in most of the vowels and age groups. Significant differences in all formant frequencies were found across the Malay vowels in both Malay male and female children for all age groups except for F4 in female children aged 12 years. Generally, the Malaysian Malay children showed a nonsystematic decrement in formant frequencies with age. Low levels of significant differences in formant frequencies were observed across the age groups in most of the vowels for F1, F3, and F4 in Malay male children and F1 and F4 in Malay female children.
    Matched MeSH terms: Voice Quality*
  8. Zourmand A, Ting HN, Mirhassani SM
    J Voice, 2013 Mar;27(2):201-9.
    PMID: 23473455 DOI: 10.1016/j.jvoice.2012.12.006
    Speech is one of the prevalent communication mediums for humans. Identifying the gender of a child speaker based on his/her speech is crucial in telecommunication and speech therapy. This article investigates the use of fundamental and formant frequencies from sustained vowel phonation to distinguish the gender of Malay children aged between 7 and 12 years. The Euclidean minimum distance and multilayer perceptron were used to classify the gender of 360 Malay children based on different combinations of fundamental and formant frequencies (F0, F1, F2, and F3). The Euclidean minimum distance with normalized frequency data achieved a classification accuracy of 79.44%, which was higher than that of the nonnormalized frequency data. Age-dependent modeling was used to improve the accuracy of gender classification. The Euclidean distance method obtained 84.17% based on the optimal classification accuracy for all age groups. The accuracy was further increased to 99.81% using multilayer perceptron based on mel-frequency cepstral coefficients.
    Matched MeSH terms: Voice Quality*
  9. Ong YQ, Lee J, Chu SY, Chai SC, Gan KB, Ibrahim NM, et al.
    Int J Lang Commun Disord, 2024;59(5):1701-1714.
    PMID: 38451114 DOI: 10.1111/1460-6984.13025
    BACKGROUND: Parkinson's disease (PD) has an impact on speech production, manifesting in various ways including alterations in voice quality, challenges in articulating sounds and a decrease in speech rate. Numerous investigations have been conducted to ascertain the oral-diadochokinesis (O-DDK) rate in individuals with PD. However, the existing literature lacks exploration of such O-DDK rates in Malaysia and does not provide consistent evidence regarding the advantage of real-word repetition.

    AIMS: To explore the effect of gender, stimuli type and PD status and their interactions on the O-DDK rates among Malaysian-Malay speakers.

    METHODS & PROCEDURES: O-DDK performance of 62 participants (29 individuals with PD and 33 healthy elderly) using a non-word ('pataka'), a Malay real-word ('patahkan') and an English real-word ('buttercake') was audio recorded. The number of syllables produced in 8 s was counted. A hierarchical linear modelling was performed to investigate the effects of stimuli type (non-word, Malay real-word, English real-word), PD status (yes, no), gender (male, female) and their interactions on the O-DDK rate. The model accounted for participants' age as well as the nesting of repeated measurements within participants, thereby providing unbiased estimates of the effects.

    OUTCOMES & RESULTS: The stimuli effect was significant (p < 0.0001). Malay real-word showed the lowest O-DDK rate (5.03 ± 0.11 syllables/s), followed by English real-word (5.25 ± 0.11 syllables/s) and non-word (5.42 ± 0.11 syllables/s). Individuals with PD showed a significantly lower O-DDK rate compared to healthy elderly (4.73 ± 0.15 syllables/s vs. 5.74 ± 0.14 syllables/s, adjusted p < 0.001). A subsequent analysis indicated that the O-DDK rate declined in a quadratic pattern. However, neither gender nor age effects were observed. Additionally, no significant two-way interactions were found between stimuli type, PD status and gender (all p > 0.05). Therefore, the choice of stimuli type has no or only limited effect considering the use of O-DDK tests in clinical practice for diagnostic purposes.

    CONCLUSIONS & IMPLICATIONS: The observed slowness in O-DDK among individuals with PD can be attributed to the impact of the movement disorder, specifically bradykinesia, on the physiological aspects of speech production. Speech-language pathologists can gain insights into the impact of PD on speech production and tailor appropriate intervention strategies to address the specific needs of individuals with PD according to disease stages.

    WHAT THIS PAPER ADDS: What is already known on this subject The observed slowness in O-DDK rates among individuals with PD may stem from the movement disorder's effects on the physiological aspects of speech production, particularly bradykinesia. However, there is a lack of consistent evidence regarding the influence of real-word repetition and how O-DDK rates vary across different PD stages. What this study adds to existing knowledge The O-DDK rates decline in a quadratic pattern as the PD progresses. The research provides insights into the advantage of real-word repetition in assessing O-DDK rates, with Malay real-word showing the lowest O-DDK rate, followed by English real-word and non-word. What are the potential or actual clinical implications of this work? Speech-language pathologists can better understand the evolving nature of speech motor impairments as PD progresses. This insight enables them to design targeted intervention strategies that are sensitive to the specific needs and challenges associated with each PD stage. This finding can guide clinicians in selecting appropriate assessment tools for evaluating speech motor function in PD patients.

    Matched MeSH terms: Voice Quality
  10. Phoon HS, Abdullah AC, Maclagan M
    Int J Speech Lang Pathol, 2012 Dec;14(6):487-98.
    PMID: 23039125 DOI: 10.3109/17549507.2012.719549
    This study investigates the effect of dialect on phonological analyses in Chinese-influenced Malaysian English (ChME) speaking children. A total of 264 typically-developing ChME speaking children aged 3-7 years participated in this cross-sectional study. A single word naming task consisting of 195 words was used to elicit speech from the children. The samples obtained were transcribed phonetically and analysed descriptively and statistically. Phonological analyses were completed for speech sound accuracy, age of consonant acquisition, percentage of phonological process occurrence, and age of suppression for phonological processes. All these measurements differed based on whether or not ChME dialectal features were considered correct, with children gaining higher scores when ChME dialect features were considered correct. The findings of the present study provide guidelines for Malaysian speech-language pathologists and stress the need to appropriately consider ChME dialectal features in the phonological analysis of ChME speaking children. They also highlight the issues in accurate differential diagnosis of speech impairment for speech-language pathologists working with children from any linguistically diverse background.
    Matched MeSH terms: Voice Quality*
  11. Ahmad K, Yan Y, Bless D
    J Voice, 2012 Nov;26(6):751-9.
    PMID: 22633334 DOI: 10.1016/j.jvoice.2011.12.002
    A high proportion of the geriatric population suffers from presbylaryngis and presbyphonia; however, our knowledge of vibratory patterns in this population is almost nonexistent. In this study, we investigate the vocal fold vibratory patterns of healthy elderly females to determine which features or combination of them could best describe the geriatric voices.
    Matched MeSH terms: Voice Quality*
  12. Ong FM, Husna Nik Hassan NF, Azman M, Sani A, Mat Baki M
    J Voice, 2019 Jul;33(4):581.e17-581.e23.
    PMID: 29793874 DOI: 10.1016/j.jvoice.2018.01.015
    OBJECTIVES: This study aimed to determine the validity and reliability of Bahasa Malaysia version of Voice Handicap Index-10 (mVHI-10).

    MATERIALS AND METHODS: This cross-sectional study was carried out in the Otorhinolaryngology, Head and Neck Surgery Department of Universiti Kebangsaan Malaysia Medical Centre (UKMMC) from June 2015 to May 2016. The mVHI-10 was produced following a rigorous forward and backward translation. One hundred participants, including 50 healthy volunteers (17 male, 33 female) and 50 patients with voice disorders (26 male, 24 female), were recruited to complete the mVHI-10 before flexible laryngoscopic examinations and acoustic analysis. The mVHI-10 was repeated in 2 weeks via telephone interview or clinic visit. Its reliability and validity were assessed using interclass correlation.

    RESULTS: The test-retest reliability for total mVHI-10 and each item score was high, with the Cronbach alpha of >0.90. The total mVHI-10 score and domain scores were significantly higher (P 

    Matched MeSH terms: Voice Quality*
  13. Johari SF, Azman M, Mohamed AS, Baki MM
    J Laryngol Otol, 2020 Dec;134(12):1085-1093.
    PMID: 33308327 DOI: 10.1017/S0022215120002558
    OBJECTIVE: To evaluate voice intensity as the primary outcome measurement when treating unilateral vocal fold paralysis patients.

    METHODS: This prospective observational study comprised 34 newly diagnosed unilateral vocal fold paralysis patients undergoing surgical interventions: injection laryngoplasty or medialisation thyroplasty. Voice assessments, including maximum vocal intensity and other acoustic parameters, were performed at baseline and at one and three months post-intervention. Maximum vocal intensity was also repeated within two weeks before any surgical interventions were performed. The results were compared between different time points and between the two intervention groups.

    RESULTS: Maximum vocal intensity showed high internal consistency. Statistically significant improvements were seen in maximum vocal intensity, Voice Handicap Index-10 and other acoustic analyses at one and three months post-intervention. A significant moderate negative correlation was demonstrated between maximum vocal intensity and Voice Handicap Index-10, shimmer and jitter. There were no significant differences in voice outcomes between injection laryngoplasty and medialisation thyroplasty patients at any time point.

    CONCLUSION: Maximum vocal intensity can be applied as a treatment outcome measure in unilateral vocal fold paralysis patients; it can demonstrate the effectiveness of treatment and moderately correlates with self-reported outcome measures.

    Matched MeSH terms: Voice Quality/physiology*
  14. Mustafa MB, Ainon RN
    J Acoust Soc Am, 2013 Oct;134(4):3057-66.
    PMID: 24116440 DOI: 10.1121/1.4818741
    The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech.
    Matched MeSH terms: Voice Quality*
  15. Ali Z, Alsulaiman M, Muhammad G, Elamvazuthi I, Al-Nasheri A, Mesallam TA, et al.
    J Voice, 2017 May;31(3):386.e1-386.e8.
    PMID: 27745756 DOI: 10.1016/j.jvoice.2016.09.009
    A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection.
    Matched MeSH terms: Voice Quality*
  16. Moy FM, Hoe VC, Hairi NN, Chu AH, Bulgiba A, Koh D
    PLoS One, 2015;10(11):e0141963.
    PMID: 26540291 DOI: 10.1371/journal.pone.0141963
    OBJECTIVES: To establish the prevalence of voice disorder using the Malay-Voice Handicap Index 10 (Malay-VHI-10) and to study the determinants, quality of life, depression, anxiety and stress associated with voice disorder among secondary school teachers in Peninsular Malaysia.

    METHODS: This study was divided into two phases. Phase I tested the reliability of the Malay-VHI-10 while Phase II was a cross-sectional study with two-stage sampling. In Phase II, a self-administered questionnaire was used to collect socio-demographic and teaching characteristics, depression, anxiety and stress scale (Malay version of DASS-21); and health-related quality of life (Malay version of SF12-v2). Complex sample analysis was conducted using multivariate Poisson regression with robust variance.

    RESULTS: In Phase I, the Spearman correlation coefficient and Cronbach alpha for total VHI-10 score was 0.72 (p < 0.001) and 0.77 respectively; showing good correlation and internal consistency. The ICCs ranged from 0.65 to 0.78 showing fair to good reliability and demonstrating the subscales to be reliable and stable. A total of 6039 teachers participated in Phase II. They were primarily Malays, females, married, had completed tertiary education and aged between 30 to 50 years. A total of 10.4% (95% CI 7.1, 14.9) of the teachers had voice disorder (VHI-10 score > 11). Compared to Malays, a greater proportion of ethnic Chinese teachers reported voice disorder while ethnic Indian teachers were less likely to report this problem. There was a higher prevalence ratio (PR) of voice disorder among single or divorced/widowed teachers. Teachers with voice disorder were more likely to report higher rates of absenteeism (PR: 1.70, 95% CI 1.33, 2.19), lower quality of life with lower SF12-v2 physical (0.98, 95% CI 0.96, 0.99) and mental (0.97, 95% CI 0.96, 0.98) component summary scales; and higher anxiety levels (1.04, 95% CI 1.02, 1.06).

    CONCLUSIONS: The Malay-VHI-10 is valid and reliable. Voice disorder was associated with increased absenteeism, marginally associated with reduced health-related quality of life as well as increased anxiety among teachers.

    Matched MeSH terms: Voice Quality/physiology
  17. Ooi CC, Wong AM
    Int J Speech Lang Pathol, 2012 Dec;14(6):499-508.
    PMID: 23039126 DOI: 10.3109/17549507.2012.712159
    One reason why specific language impairment (SLI) is grossly under-identified in Malaysia is the absence of locally- developed norm-referenced language assessment tools for its multilingual and multicultural population. Spontaneous language samples provide quantitative information for language assessment, and useful descriptive information on child language development in complex language and cultural environments. This research consisted of two studies and investigated the use of measures obtained from English conversational samples among bilingual Chinese-English Malaysian preschoolers. The research found that the language sample measures were sensitive to developmental changes in this population and could identify SLI. The first study examined the relationship between age and mean length of utterance (MLU(w)), lexical diversity (D), and the index of productive syntax (IPSyn) among 52 typically-developing (TD) children aged between 3;4-6;9. Analyses showed a significant linear relationship between age and D (r = .450), the IPsyn (r = .441), and MLU(w) (r = .318). The second study compared the same measures obtained from 10 children with SLI, aged between 3;8-5;11, and their age-matched controls. The children with SLI had significantly shorter MLU(w) and lower IPSyn scores than the TD children. These findings suggest that utterance length and syntax production can be potential clinical markers of SLI in Chinese-English Malaysian children.
    Matched MeSH terms: Voice Quality*
  18. Ali Z, Elamvazuthi I, Alsulaiman M, Muhammad G
    J Voice, 2016 Nov;30(6):757.e7-757.e19.
    PMID: 26522263 DOI: 10.1016/j.jvoice.2015.08.010
    BACKGROUND AND OBJECTIVE: Automatic voice pathology detection using sustained vowels has been widely explored. Because of the stationary nature of the speech waveform, pathology detection with a sustained vowel is a comparatively easier task than that using a running speech. Some disorder detection systems with running speech have also been developed, although most of them are based on a voice activity detection (VAD), that is, itself a challenging task. Pathology detection with running speech needs more investigation, and systems with good accuracy (ACC) are required. Furthermore, pathology classification systems with running speech have not received any attention from the research community. In this article, automatic pathology detection and classification systems are developed using text-dependent running speech without adding a VAD module.

    METHOD: A set of three psychophysics conditions of hearing (critical band spectral estimation, equal loudness hearing curve, and the intensity loudness power law of hearing) is used to estimate the auditory spectrum. The auditory spectrum and all-pole models of the auditory spectrums are computed and analyzed and used in a Gaussian mixture model for an automatic decision.

    RESULTS: In the experiments using the Massachusetts Eye & Ear Infirmary database, an ACC of 99.56% is obtained for pathology detection, and an ACC of 93.33% is obtained for the pathology classification system. The results of the proposed systems outperform the existing running-speech-based systems.

    DISCUSSION: The developed system can effectively be used in voice pathology detection and classification systems, and the proposed features can visually differentiate between normal and pathological samples.

    Matched MeSH terms: Voice Quality*
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links