Displaying all 3 publications

Abstract:
Sort:
  1. Hossain ME, Jassim WA, Zilany MS
    PLoS One, 2016;11(3):e0150415.
    PMID: 26967160 DOI: 10.1371/journal.pone.0150415
    Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants.
  2. Zilany MS, Bruce IC, Carney LH
    J Acoust Soc Am, 2014 Jan;135(1):283-6.
    PMID: 24437768 DOI: 10.1121/1.4837815
    A phenomenological model of the auditory periphery in cats was previously developed by Zilany and colleagues [J. Acoust. Soc. Am. 126, 2390-2412 (2009)] to examine the detailed transformation of acoustic signals into the auditory-nerve representation. In this paper, a few issues arising from the responses of the previous version have been addressed. The parameters of the synapse model have been readjusted to better simulate reported physiological discharge rates at saturation for higher characteristic frequencies [Liberman, J. Acoust. Soc. Am. 63, 442-455 (1978)]. This modification also corrects the responses of higher-characteristic frequency (CF) model fibers to low-frequency tones that were erroneously much higher than the responses of low-CF model fibers in the previous version. In addition, an analytical method has been implemented to compute the mean discharge rate and variance from the model's synapse output that takes into account the effects of absolute refractoriness.
  3. Islam MA, Jassim WA, Cheok NS, Zilany MS
    PLoS One, 2016;11(7):e0158520.
    PMID: 27392046 DOI: 10.1371/journal.pone.0158520
    Speaker identification under noisy conditions is one of the challenging topics in the field of speech processing applications. Motivated by the fact that the neural responses are robust against noise, this paper proposes a new speaker identification system using 2-D neurograms constructed from the responses of a physiologically-based computational model of the auditory periphery. The responses of auditory-nerve fibers for a wide range of characteristic frequency were simulated to speech signals to construct neurograms. The neurogram coefficients were trained using the well-known Gaussian mixture model-universal background model classification technique to generate an identity model for each speaker. In this study, three text-independent and one text-dependent speaker databases were employed to test the identification performance of the proposed method. Also, the robustness of the proposed method was investigated using speech signals distorted by three types of noise such as the white Gaussian, pink, and street noises with different signal-to-noise ratios. The identification results of the proposed neural-response-based method were compared to the performances of the traditional speaker identification methods using features such as the Mel-frequency cepstral coefficients, Gamma-tone frequency cepstral coefficients and frequency domain linear prediction. Although the classification accuracy achieved by the proposed method was comparable to the performance of those traditional techniques in quiet, the new feature was found to provide lower error rates of classification under noisy environments.
Related Terms
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links