Affiliations 

  • 1 Human-Centric Technology Interaction SIG, Faculty of Business, Multimedia University, Jalan Ayer Keroh Lama, 75450 Melaka, Malaysia
  • 2 Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450 Melaka, Malaysia
  • 3 Language Academy, Faculty of Social Sciences and Humanities, Universiti Teknologi Malaysia, Jalan Iman, 81310 Skudai, Johor Malaysia
Educ Inf Technol (Dordr), 2023;28(2):1455-1489.
PMID: 35967831 DOI: 10.1007/s10639-022-11255-6

Abstract

Modern text-to-speech voices can convey social cues ideal for narrating multimedia learning materials. Amazon Alexa has a unique feature among modern text-to-speech vocalizers as she can infuse enthusiasm cues into her synthetic voice. In this first study examining modern text-to-speech voice enthusiasm effects in a multimedia learning environment, a between-subjects online experiment was conducted where learners from a large Asian university (n = 244) listened to either Alexa's: (1) neutral voice, (2) low-enthusiastic voice, (3) medium-enthusiastic voice, or (4) high-enthusiastic voice, narrating a multimedia lesson on distributed denial-of-service attack. While Alexa's enthusiastic voices did not enhance persona ratings compared to Alexa's neutral voice, learners could infer more enthusiasm expressed by Alexa's medium-and high-enthusiastic voices than Alexa's neutral voice. Regarding cognitive load, Alexa's low-and high-enthusiastic voices decreased intrinsic and extraneous cognitive load ratings compared to Alexa's neutral voice. While Alexa's enthusiastic voices did not impact affective-motivational ratings differently from Alexa's neutral voice, learners reported a significant increase of positive emotions from their baseline positive emotions after listening to Alexa's medium-enthusiastic voice. Finally, Alexa's enthusiastic voices did not enhance the learning performance on immediate retention and transfer tests compared to Alexa's neutral voice. This study demonstrates that a modern text-to-speech voice enthusiasm can positively affect learners' emotions and cognitive load during multimedia learning. Theoretical and practical implications are discussed through the lens of the Cognitive Affective Model of E-learning, Integrated-Cognitive Affective Model of Learning with Multimedia, and Cognitive Load Theory. We further outline this study's limitations and recommendations for extending and widening the text-to-speech voice emotions research.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.