MyMedR

Displaying all 14 publications

Abstract:

Sort:

Fulltext Characterisation of structure-borne sound source using reception plate method

Putra A, Saari NF, Bakri H, Ramlan R, Dan RM

ScientificWorldJournal, 2013;2013:742853.
PMID: 24324380 DOI: 10.1155/2013/742853

A laboratory-based experiment procedure of reception plate method for structure-borne sound source characterisation is reported in this paper. The method uses the assumption that the input power from the source installed on the plate is equal to the power dissipated by the plate. In this experiment, rectangular plates having high and low mobility relative to that of the source were used as the reception plates and a small electric fan motor was acting as the structure-borne source. The data representing the source characteristics, namely, the free velocity and the source mobility, were obtained and compared with those from direct measurement. Assumptions and constraints employing this method are discussed.

Matched MeSH terms: Sound Spectrography/methods*
Whistle description of Irrawaddy dolphins (Orcaella brevirostris) in Bay of Brunei, Sarawak, Malaysia

Muhamad HM, Xu X, Zhang X, Jaaman SA, Muda AM

J Acoust Soc Am, 2018 05;143(5):2708.
PMID: 29857727 DOI: 10.1121/1.5036926

Studies of Irrawaddy dolphins' acoustics assist in understanding the behaviour of the species and thereby conservation of this species. Whistle signals emitted by Irrawaddy dolphin within the Bay of Brunei in Malaysian waters were characterized. A total of 199 whistles were analysed from seven sightings between January and April 2016. Six types of whistles contours named constant, upsweep, downsweep, concave, convex, and sine were detected when the dolphins engaged in traveling, foraging, and socializing activities. The whistle durations ranged between 0.06 and 3.86 s. The minimum frequency recorded was 443 Hz [Mean = 6000 Hz, standard deviation (SD) = 2320 Hz] and the maximum frequency recorded was 16 071 Hz (Mean = 7139 Hz, SD = 2522 Hz). The mean frequency range (F.R.) for the whistles was 1148 Hz (Minimum F.R. = 0 Hz, Maximum F.R. = 4446 Hz; SD = 876 Hz). Whistles in the Bay of Brunei were compared with population recorded from the waters of Matang and Kalimantan. The comparisons showed differences in whistle duration, minimum frequency, start frequency, and number of inflection point. Variation in whistle occurrence and frequency may be associated with surface behaviour, ambient noise, and recording limitation. This will be an important element when planning a monitoring program.

Matched MeSH terms: Sound Spectrography/methods
Whistles emitted by Indo-Pacific humpback dolphins (Sousa chinensis) in Zhanjiang waters, China

Dong L, Caruso F, Lin M, Liu M, Gong Z, Dong J, et al.

J Acoust Soc Am, 2019 06;145(6):3289.
PMID: 31255103 DOI: 10.1121/1.5110304

Whistles emitted by Indo-Pacific humpback dolphins in Zhanjiang waters, China, were collected by using autonomous acoustic recorders. A total of 529 whistles with clear contours and signal-to-noise ratio higher than 10 dB were extracted for analysis. The fundamental frequencies and durations of analyzed whistles were in ranges of 1785-21 675 Hz and 30-1973 ms, respectively. Six tonal types were identified: constant, downsweep, upsweep, concave, convex, and sine whistles. Constant type was the most dominant tonal type, accounting for 32.51% of all whistles, followed by sine type, accounting for 19.66% of all whistles. This paper examined 17 whistle parameters, which showed significant differences among the six tonal types. Whistles without inflections, gaps, and stairs accounted for 62.6%, 80.6%, and 68.6% of all whistles, respectively. Significant intraspecific differences in all duration and frequency parameters of dolphin whistles were found between this study and the study in Malaysia. Except for start frequency, maximum frequency and the number of harmonics, all whistle parameters showed significant differences between this study and the study conducted in Sanniang Bay, China. The intraspecific differences in vocalizations for this species may be related to macro-geographic and/or environmental variations among waters, suggesting a potential geographic isolation among populations of Indo-Pacific humpback dolphins.

Matched MeSH terms: Sound Spectrography/methods
Fulltext Vocalisations of the bigeye Pempheris adspersa: characteristics, source level and active space

Radford CA, Ghazali S, Jeffs AG, Montgomery JC

J Exp Biol, 2015 Mar;218(Pt 6):940-8.
PMID: 25617461 DOI: 10.1242/jeb.115295

Fish sounds are an important biological component of the underwater soundscape. Understanding species-specific sounds and their associated behaviour is critical for determining how animals use the biological component of the soundscape. Using both field and laboratory experiments, we describe the sound production of a nocturnal planktivore, Pempheris adspersa (New Zealand bigeye), and provide calculations for the potential effective distance of the sound for intraspecific communication. Bigeye vocalisations recorded in the field were confirmed as such by tank recordings. They can be described as popping sounds, with individual pops of short duration (7.9±0.3 ms) and a peak frequency of 405±12 Hz. Sound production varied during a 24 h period, with peak vocalisation activity occurring during the night, when the fish are most active. The source level of the bigeye vocalisation was 115.8±0.2 dB re. 1 µPa at 1 m, which is relatively quiet compared with other soniferous fish. Effective calling range, or active space, depended on both season and lunar phase, with a maximum calling distance of 31.6 m and a minimum of 0.6 m. The bigeyes' nocturnal behaviour, characteristics of their vocalisation, source level and the spatial scale of its active space reported in the current study demonstrate the potential for fish vocalisations to function effectively as contact calls for maintaining school cohesion in darkness.

Matched MeSH terms: Sound Spectrography
Fulltext I feel you: the design and evaluation of a domotic affect-sensitive spoken conversational agent

Lutfi SL, Fernández-Martínez F, Lorenzo-Trueba J, Barra-Chicote R, Montero JM

Sensors (Basel), 2013;13(8):10519-38.
PMID: 23945740 DOI: 10.3390/s130810519

We describe the work on infusion of emotion into a limited-task autonomous spoken conversational agent situated in the domestic environment, using a need-inspired task-independent emotion model (NEMO). In order to demonstrate the generation of affect through the use of the model, we describe the work of integrating it with a natural-language mixed-initiative HiFi-control spoken conversational agent (SCA). NEMO and the host system communicate externally, removing the need for the Dialog Manager to be modified, as is done in most existing dialog systems, in order to be adaptive. The first part of the paper concerns the integration between NEMO and the host agent. The second part summarizes the work on automatic affect prediction, namely, frustration and contentment, from dialog features, a non-conventional source, in the attempt of moving towards a more user-centric approach. The final part reports the evaluation results obtained from a user study, in which both versions of the agent (non-adaptive and emotionally-adaptive) were compared. The results provide substantial evidences with respect to the benefits of adding emotion in a spoken conversational agent, especially in mitigating users' frustrations and, ultimately, improving their satisfaction.

Matched MeSH terms: Sound Spectrography/methods*
Optimization of MFCC parameters using Particle Swarm Optimization for diagnosis of infant hypothyroidism using Multi- Layer Perceptron

Zabidi A, Lee YK, Mansor W, Yassin IM, Sahak R

Annu Int Conf IEEE Eng Med Biol Soc, 2010;2010:1417-20.
PMID: 21096346 DOI: 10.1109/IEMBS.2010.5626712

This paper presents a new application of the Particle Swarm Optimization (PSO) algorithm to optimize Mel Frequency Cepstrum Coefficients (MFCC) parameters, in order to extract an optimal feature set for diagnosis of hypothyroidism in infants using Multi-Layer Perceptrons (MLP) neural network. MFCC features is influenced by the number of filter banks (f(b)) and the number of coefficients (n(c)) used. These parameters are critical in representation of the features as they affect the resolution and dimensionality of the features. In this paper, the PSO algorithm was used to optimize the values of f(b) and n(c). The MFCC features based on the PSO optimization were extracted from healthy and unhealthy infant cry signals and used to train MLP in the classification of hypothyroid infant cries. The results indicate that the PSO algorithm could determine the optimum combination of f(b) and n(c) that produce the best classification accuracy of the MLP.

Matched MeSH terms: Sound Spectrography/methods*
Knowledge based system with embedded intelligent heart sound analyser for diagnosing cardiovascular disorders

Javed F, Venkatachalam PA, Hani AF

J Med Eng Technol, 2007 Sep-Oct;31(5):341-50.
PMID: 17701779 DOI: 10.1080/03091900600887876

Cardiovascular disease (CVD) is the leading cause of death worldwide, and due to the lack of early detection techniques, the incidence of CVD is increasing day by day. In order to address this limitation, a knowledge based system with embedded intelligent heart sound analyser (KBHSA) has been developed to diagnose cardiovascular disorders at early stages. The system analyses digitized heart sounds that are recorded from an electronic stethoscope using advanced digital signal processing and artificial intelligence techniques. KBHSA takes into account data including the patient's personal and past medical history, clinical examination, auscultation findings, chest x-ray and echocardiogram, and provides a list of diseases that it has diagnosed. The system can assist the general physician in making more accurate and reliable diagnosis under emergency conditions where expert cardiologists and advanced equipment are not readily available. To test the validity of the system, abnormal heart sound samples and medical data from 40 patients were recorded and analysed. The diagnoses made by the system were counter checked by four senior cardiologists in Malaysia. The results show that the findings of KBHSA coincide with those of cardiologists.

Matched MeSH terms: Sound Spectrography/methods*
Fulltext Noise characteristics of grass-trimming machine engines and their effect on operators

Mallick Z, Badruddin IA, Khaleed Hussain MT, Salman Ahmed NJ, Kanesan J

Noise Health, 2009 Apr-Jun;11(43):98-102.
PMID: 19414929 DOI: 10.4103/1463-1741.50694

Over the last few years, interaction of humans with noisy power-driven agricultural tools and its possible adverse after effects have been realized. Grass-trimmer engine is the primary source of noise and the use of motorized cutter, spinning at high speed, is the secondary source of noise to which operators are exposed. In the present study, investigation was carried out to determine the effect of two types of grass-trimming machine engines (SUM 328 SE and BG 328) noise on the operators in real working environment. It was found that BG-328 and SUM-328 SE produced high levels of noise, of the order of 100 and 105 dB(A), respectively, to which operators are exposed while working. It was also observed that situation aggravates when a number of operators simultaneously operate resulting in still higher levels of noise. Operators should be separated 15 meters from each other in order to avoid the combined level of noise exposure while working with these machines. It was found that SPL, of the grass-trimmer machine engines (BG-328 and SUM-328 SE), were higher than the limit of noise recommended by ISO, NIOSH, and OSHA for an 8-hour workday. Such a high level of noise exposure may cause physiological and psychological problems to the operators in long run.

Matched MeSH terms: Sound Spectrography
Vocal acoustics in the endangered proboscis monkey (Nasalis larvatus)

Röper KM, Scheumann M, Wiechert AB, Nathan S, Goossens B, Owren MJ, et al.

Am J Primatol, 2014 Feb;76(2):192-201.
PMID: 24123122 DOI: 10.1002/ajp.22221

The endangered proboscis monkey (Nasalis larvatus) is a sexually highly dimorphic Old World primate endemic to the island of Borneo. Previous studies focused mainly on its ecology and behavior, but knowledge of its vocalizations is limited. The present study provides quantified information on vocal rate and on the vocal acoustics of the prominent calls of this species. We audio-recorded vocal behavior of 10 groups over two 4-month periods at the Lower Kinabatangan Wildlife Sanctuary in Sabah, Borneo. We observed monkeys and recorded calls in evening and morning sessions at sleeping trees along riverbanks. We found no differences in the vocal rate between evening and morning observation sessions. Based on multiparametric analysis, we identified acoustic features of the four common call-types "shrieks," "honks," "roars," and "brays." "Chorus" events were also noted in which multiple callers produced a mix of vocalizations. The four call-types were distinguishable based on a combination of fundamental frequency variation, call duration, and degree of voicing. Three of the call-types can be considered as "loud calls" and are therefore deemed promising candidates for non-invasive, vocalization-based monitoring of proboscis monkeys for conservation purposes.

Matched MeSH terms: Sound Spectrography
Fulltext Mouth-clicks used by blind expert human echolocators - signal description and model based signal synthesis

Thaler L, Reich GM, Zhang X, Wang D, Smith GE, Tao Z, et al.

PLoS Comput Biol, 2017 Aug;13(8):e1005670.
PMID: 28859082 DOI: 10.1371/journal.pcbi.1005670

Echolocation is the ability to use sound-echoes to infer spatial information about the environment. Some blind people have developed extraordinary proficiency in echolocation using mouth-clicks. The first step of human biosonar is the transmission (mouth click) and subsequent reception of the resultant sound through the ear. Existing head-related transfer function (HRTF) data bases provide descriptions of reception of the resultant sound. For the current report, we collected a large database of click emissions with three blind people expertly trained in echolocation, which allowed us to perform unprecedented analyses. Specifically, the current report provides the first ever description of the spatial distribution (i.e. beam pattern) of human expert echolocation transmissions, as well as spectro-temporal descriptions at a level of detail not available before. Our data show that transmission levels are fairly constant within a 60° cone emanating from the mouth, but levels drop gradually at further angles, more than for speech. In terms of spectro-temporal features, our data show that emissions are consistently very brief (~3ms duration) with peak frequencies 2-4kHz, but with energy also at 10kHz. This differs from previous reports of durations 3-15ms and peak frequencies 2-8kHz, which were based on less detailed measurements. Based on our measurements we propose to model transmissions as sum of monotones modulated by a decaying exponential, with angular attenuation by a modified cardioid. We provide model parameters for each echolocator. These results are a step towards developing computational models of human biosonar. For example, in bats, spatial and spectro-temporal features of emissions have been used to derive and test model based hypotheses about behaviour. The data we present here suggest similar research opportunities within the context of human echolocation. Relatedly, the data are a basis to develop synthetic models of human echolocation that could be virtual (i.e. simulated) or real (i.e. loudspeaker, microphones), and which will help understanding the link between physical principles and human behaviour.

Matched MeSH terms: Sound Spectrography
Fulltext Updated parameters and expanded simulation options for a model of the auditory periphery

Zilany MS, Bruce IC, Carney LH

J Acoust Soc Am, 2014 Jan;135(1):283-6.
PMID: 24437768 DOI: 10.1121/1.4837815

A phenomenological model of the auditory periphery in cats was previously developed by Zilany and colleagues [J. Acoust. Soc. Am. 126, 2390-2412 (2009)] to examine the detailed transformation of acoustic signals into the auditory-nerve representation. In this paper, a few issues arising from the responses of the previous version have been addressed. The parameters of the synapse model have been readjusted to better simulate reported physiological discharge rates at saturation for higher characteristic frequencies [Liberman, J. Acoust. Soc. Am. 63, 442-455 (1978)]. This modification also corrects the responses of higher-characteristic frequency (CF) model fibers to low-frequency tones that were erroneously much higher than the responses of low-CF model fibers in the previous version. In addition, an analytical method has been implemented to compute the mean discharge rate and variance from the model's synapse output that takes into account the effects of absolute refractoriness.

Matched MeSH terms: Sound Spectrography
Emotional speech acoustic model for Malay: iterative versus isolated unit training

Mustafa MB, Ainon RN

J Acoust Soc Am, 2013 Oct;134(4):3057-66.
PMID: 24116440 DOI: 10.1121/1.4818741

The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech.

Matched MeSH terms: Sound Spectrography
Fulltext The heterospecific calling song can improve conspecific signal detection in a bushcricket species

Abdelatti ZAS, Hartbauer M

Hear Res, 2017 11;355:70-80.
PMID: 28974384 DOI: 10.1016/j.heares.2017.09.011

In forest clearings of the Malaysian rainforest, chirping and trilling Mecopoda species often live in sympatry. We investigated whether a phenomenon known as stochastic resonance (SR) improved the ability of individuals to detect a low-frequent signal component typical of chirps when members of the heterospecific trilling species were simultaneously active. This phenomenon may explain the fact that the chirping species upholds entrainment to the conspecific song in the presence of the trill. Therefore, we evaluated the response probability of an ascending auditory neuron (TN-1) in individuals of the chirping Mecopoda species to triple-pulsed 2, 8 and 20 kHz signals that were broadcast 1 dB below the hearing threshold while increasing the intensity of either white noise or a typical triller song. Our results demonstrate the existence of SR over a rather broad range of signal-to-noise ratios (SNRs) of input signals when periodic 2 kHz and 20 kHz signals were presented at the same time as white noise. Using the chirp-specific 2 kHz signal as a stimulus, the maximum TN-1 response probability frequently exceeded the 50% threshold if the trill was broadcast simultaneously. Playback of an 8 kHz signal, a common frequency band component of the trill, yielded a similar result. Nevertheless, using the trill as a masker, the signal-related TN-1 spiking probability was rather variable. The variability on an individual level resulted from correlations between the phase relationship of the signal and syllables of the trill. For the first time, these results demonstrate the existence of SR in acoustically-communicating insects and suggest that the calling song of heterospecifics may facilitate the detection of a subthreshold signal component in certain situations. The results of the simulation of sound propagation in a computer model suggest a wide range of sender-receiver distances in which the triller can help to improve the detection of subthreshold signals in the chirping species.

Matched MeSH terms: Sound Spectrography
Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model

Ali Z, Elamvazuthi I, Alsulaiman M, Muhammad G

J Voice, 2016 Nov;30(6):757.e7-757.e19.
PMID: 26522263 DOI: 10.1016/j.jvoice.2015.08.010

BACKGROUND AND OBJECTIVE: Automatic voice pathology detection using sustained vowels has been widely explored. Because of the stationary nature of the speech waveform, pathology detection with a sustained vowel is a comparatively easier task than that using a running speech. Some disorder detection systems with running speech have also been developed, although most of them are based on a voice activity detection (VAD), that is, itself a challenging task. Pathology detection with running speech needs more investigation, and systems with good accuracy (ACC) are required. Furthermore, pathology classification systems with running speech have not received any attention from the research community. In this article, automatic pathology detection and classification systems are developed using text-dependent running speech without adding a VAD module.
METHOD: A set of three psychophysics conditions of hearing (critical band spectral estimation, equal loudness hearing curve, and the intensity loudness power law of hearing) is used to estimate the auditory spectrum. The auditory spectrum and all-pole models of the auditory spectrums are computed and analyzed and used in a Gaussian mixture model for an automatic decision.
RESULTS: In the experiments using the Massachusetts Eye & Ear Infirmary database, an ACC of 99.56% is obtained for pathology detection, and an ACC of 93.33% is obtained for the pathology classification system. The results of the proposed systems outperform the existing running-speech-based systems.
DISCUSSION: The developed system can effectively be used in voice pathology detection and classification systems, and the proposed features can visually differentiate between normal and pathological samples.

Matched MeSH terms: Sound Spectrography

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links