

Cloud-based speech-to-text services are convenient, but they often have file size limits, lack transparency for reproducible research, and can pose privacy risks under regulations like GDPR. To address these limitations, this project introduces a production-ready, local transcription workflow using OpenAI's Whisper models. This self-contained system ensures complete data sovereignty and is designed for scalability, supporting batch operations on high-performance computing (HPC) clusters with GPU acceleration. The workflow includes advanced quality control, such as algorithms to detect and remove AI-generated repetitions, context-aware name masking for privacy, speaker diarisation, and a flexible audio enhancement pipeline. Implemented as a single Python script, this system offers a robust, reproducible, and secure alternative for academic and enterprise transcription.
If you use this workflow in your research, please cite:
Bernabeu, P. (2025). Secure and scalable speech transcription for local and HPC (Version 1.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.17624830
The recommended BibTeX entry is:
@misc{secure_local_HPC_speech_transcription,
author = {Bernabeu, Pablo},
title = {Secure and scalable speech transcription for local and {HPC}},
year = {2025},
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.17624830},
url = {https://doi.org/10.5281/zenodo.17624830}
}Created using NotebookLM, with all the benefits and blind spots of human editing.
S, R. M., V, A. R., Nelgikar, A., K, S., & Shahapur, S. (2025). Mindshare: An AI-Powered Video Processing and Sharing Platform for Learning, Collaboration, and Creative Expression. 2025 2nd International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS), 1–7. https://doi.org/10.1109/IACIS65746.2025.11211476
Koenecke, A., Choi, A. S. G., Mei, K. X., Schellmann, H., & Sloane, M. (2024). Careless Whisper: Speech-to-Text Hallucination Harms. The 2024 ACM Conference on Fairness, Accountability, and Transparency, 1672–1681. https://doi.org/10.1145/3630106.3658996
Landesvatter, C., Behnert, J., & Bauer, P. C. (2025). Comparing Speech-to-Text Algorithms for Transcribing Voice Data from Surveys. Public Opinion Quarterly, 89(4), 1154–1166. https://doi.org/10.1093/poq/nfaf056
McGuire, M., & Larson-Hall, J. (2025). Assessing Whisper automatic speech recognition and WER scoring for elicited imitation: Steps toward automation. Research Methods in Applied Linguistics, 4(1), 100197. https://doi.org/10.1016/j.rmal.2025.100197
Meehan, K., McDermott, F., & Petropoulos, N. (2024). Evaluating Automatic Transcription Models Utilising Cloud Platforms. 2024 5th International Conference on Data Analytics for Business and Industry (ICDABI), 91–96. https://doi.org/10.1109/icdabi63787.2024.10800465
Roushan, R., Mishra, H., Yadav, L., Koppula, S., Tiwari, N., & Nataraj, K. S. (2024). Optimizing Speech Recognition for Medical Transcription: Fine-Tuning Whisper and Developing a Web Application. 2024 IEEE Conference on Engineering Informatics (ICEI), 1–6. https://doi.org/10.1109/icei64305.2024.10912421
Russell, S. O., Gessinger, I., Krason, A., Vigliocco, G., & Harte, N. (2024). What automatic speech recognition can and cannot do for conversational speech transcription. Research Methods in Applied Linguistics, 3(3), 100163. https://doi.org/10.1016/j.rmal.2024.100163
Yadav, A., Shrotriya, A., & Bairwa, A. K. (2025). Fine-Tuning OpenAI Whisper and DistilWhisper: An In-Depth Analysis. Smart Cyber Physical Systems, 589–603. https://doi.org/10.1007/978-981-96-2182-8_44
Abgrall, G., Chelly Dagdia, Z., Monnet, X., Zeitouni, K., & Arora, A. (2025). From vibe coding to vibe caring: what clinicians can learn. The Lancet, 406(10512), 1561–1562. https://doi.org/10.1016/s0140-6736(25)01807-0
Affolter, J., Martin, B., Epure, E. V., Meseguer-Brocal, G., & Kaplan, F. (2026). Scalable Music Cover Retrieval Using Lyrics-Aligned Audio Embeddings. Advances in Information Retrieval, 49–66. https://doi.org/10.1007/978-3-032-21289-4_4
Alshwiah, A. A. (2026). The role of artificial intelligence (AI) in assisting students with writing Zoom meeting minutes: a qualitative study. Cogent Education, 13(1). https://doi.org/10.1080/2331186x.2026.2652076
Alumäe, T., & Koenecke, A. (2025). Striving for open-source and equitable speech-to-speech translation. Nature, 637(8046), 551–553. https://doi.org/10.1038/d41586-024-04095-6
Anderson, C. B., & Fisher, D. H. (2025). Artificial Intelligence for Academic Libraries. https://doi.org/10.4324/9781003473602
Ariya, P., Khanchai, S., Intawong, K., & Puritat, K. (2025). Enhancing textile heritage engagement through generative AI-based virtual assistants in virtual reality museums. Computers &Amp; Education: X Reality, 7, 100112. https://doi.org/10.1016/j.cexr.2025.100112
Arruda, F. S., & Castaldelli-Maia, J. M. (2025). The future of addiction psychiatry. International Review of Psychiatry, 38(1-3), 80–93. https://doi.org/10.1080/09540261.2025.2566211
Arshad, H., Abdullah, T., Rehman, M., Hussain, A., Kanwal, F., & Parveen, M. (2025). Optimizer-Aware Fine-Tuning of Whisper Small with Low-Rank Adaption: An Empirical Study of Adam and AdamW. Information, 16(11), 928. https://doi.org/10.3390/info16110928
Arsuka, I. P. G. S., Fakhrurroja, H., & Pramesti, D. (2025). Flutter-Based Mobile App Featuring AI-Driven Recommendation System and Speech Recognition for Agriculture. 2025 IEEE International Conference on Networking, Intelligent Systems, and IoT (ICONS-IoT), 108–114. https://doi.org/10.1109/icons-iot65216.2025.11210976
Ashwin, M., Jha, S., Prasad, G., & Kumar, S. (2025). Fake it till you make it? AI hallucinations and ethical dilemmas in Anesthesia research and practice. Journal of Anaesthesiology Clinical Pharmacology, 41(3), 381–383. https://doi.org/10.4103/joacp.joacp_56_25
Atwany, H., Waheed, A., Singh, R., Choudhury, M., & Raj, B. (2025). Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models. Findings of the Association for Computational Linguistics: ACL 2025, 23181–23203. https://doi.org/10.18653/v1/2025.findings-acl.1190
Barański, M., Jasiński, J., Bartolewska, J., Kacprzak, S., Witkowski, M., & Kowalczyk, K. (2025). Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/icassp49660.2025.10890105
Basak, S., & Gowda, M. (2025). Wireless-Tap:
Biecek, P., & Samek, W. (2025). Model Science: Getting Serious About Verification, Explanation and Control of AI Systems. Ecai 2025. https://doi.org/10.3233/faia250784
Bogdanova-Beglarian, N. V., Blinova, O. V., Khokhlova, M. V., Sherstinova, T. Y., & Popova, T. I. (2024). Multiword Units in Russian Everyday Speech: Empirical Classification and Corpus-Based Studies. Speech and Computer, 187–200. https://doi.org/10.1007/978-3-031-77961-9_14
Borkar, J., & Smith, D. A. (2024). Mind the Gap:Analyzing Lacunae with Transformer-Based Transcription. Document Analysis and Recognition – ICDAR 2024 Workshops, 57–70. https://doi.org/10.1007/978-3-031-70642-4_4
Brickman, J., Gupta, M., & Oltmanns, J. R. (2025). Large Language Models for Psychological Assessment: A Comprehensive Overview. Advances in Methods and Practices in Psychological Science, 8(3). https://doi.org/10.1177/25152459251343582
Bylund, A., Revenäs, Å., Södersved Källestedt, M.-L., Lindblom, M., Söderlund, A., & Elvén, M. (2025). Physiotherapy students’ clinical skills training needs within the musculoskeletal area: an explorative study of stakeholders’ perceptions. Physiotherapy Theory and Practice, 42(4), 581–594. https://doi.org/10.1080/09593985.2025.2589908
C, M. D., Noor R, S., K, S., A, G., M, J., & S, P. P. (2025). InstaScript: Convert Voice Notes to Summarized Smart Notes using NLP. 2025 International Conference on Sustainable Communication Networks and Application (ICSCN), 1765–1770. https://doi.org/10.1109/icscn67106.2025.11308439
Calandruccio, L., Hariri, M., Buss, E., & Chaudhary, V. (2025). Masked-speech Recognition Using Human and Synthetic Cloned Speech. Trends in Hearing, 29. https://doi.org/10.1177/23312165251403080
Castelli, M., Sousa, M., Vojtech, I., Single, M., Amstutz, D., Maradan-Gachet, M. E., Magalhães, A. D., Debove, I., Rusz, J., Martinez-Martin, P., Sznitman, R., Krack, P., & Nef, T. (2025). Detecting neuropsychiatric fluctuations in Parkinson’s Disease using patients’ own words: the potential of large language models. Npj Parkinson’s Disease, 11(1). https://doi.org/10.1038/s41531-025-00939-8
Chan, S., Li, J., Yao, B., Mahmood, A., Huang, C.-M., Jimison, H., Mynatt, E. D., & Wang, D. (2025). “Mango Mango, How to Let The Lettuce Dry Without A Spinner?”: Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner. Proceedings of the ACM on Human-Computer Interaction, 9(7), 1–35. https://doi.org/10.1145/3757442
Chandra Madhav, Y. R., Umar, M., Kiran, O. S., & Babu, P. A. (2025). Personality Detection Using Xlm-Roberta and Whisper. 2025 IEEE Pune Section International Conference (PuneCon), 1–6. https://doi.org/10.1109/punecon67554.2025.11379051
Charuau, D., & Harte, N. (2027). A multimodal perspective on adaptive communication: Extending the hyper- and hypo-articulation theory. Computer Speech &Amp; Language, 101, 101990. https://doi.org/10.1016/j.csl.2026.101990
Chua, Y. H. V., Dauwels, J., Rajalingam, P., Teo, C. L., & Styles, S. J. (2025). Testing individual and group markers of collaboration in a team-based learning classroom. Learning and Instruction, 100, 102215. https://doi.org/10.1016/j.learninstruc.2025.102215
Chung, Y., Hong, J., Lee, J., & Kim, E. (2026). Diagnosis-aware multitask fine-tuning of Whisper for dysarthric speech recognition. Speech Communication, 180, 103393. https://doi.org/10.1016/j.specom.2026.103393
Colonel, J. T., Hagler, C., Wismer, G., Curtis, L., Becker, J., Wisnivesky, J., Federman, A., & Pandey, G. (2025). CAtCh: Cognitive Assessment through Cookie Thief. 2025 IEEE International Conference on Digital Health (ICDH), 154–162. https://doi.org/10.1109/icdh67620.2025.00029
Correa-Carmona, Y., Böttger, D., Korsch, D., Holzmann, K. L., Alonso-Alonso, P., Pinos, A., Yon, F., Keller, A., Steffan-Dewenter, I., Bodesheim, P., Peters, M. K., & Brehm, G. (2026). LEPY: A Python pipeline for automated trait extraction from standardised Lepidoptera images. Ecological Informatics, 95, 103680. https://doi.org/10.1016/j.ecoinf.2026.103680
Czyżewski, A., Cygert, S., Marciniuk, K., Szczodrak, M., Harasimiuk, A., Odya, P., Galanina, M., Szczuko, P., Kostek, B., Graff, B., Szplit, D., Budzisz, M., & Narkiewicz, K. (2025). A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation. Scientific Data, 12(1). https://doi.org/10.1038/s41597-025-05776-1
D, S., S, S. S., S, B. P., D, N. K., & S, V. (2025). A Review of Automatic Speech Recognition Approaches for Bridging Communication Gaps in Clinical Handover. 2025 IEEE 2nd International Conference for Women in Computing (InCoWoCo), 1–7. https://doi.org/10.1109/incowoco68239.2025.11407097
Darmawan, F. A., Mauludin, M. B., & Aditya, C. S. K. (2026). Hybrid Video Transcription Summarization with a BERT-Based Clustering and BART. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 9(6), 1439–1450. https://doi.org/10.29207/resti.v9i6.7066
de Margerie-Mellon, C., Duron, L., Fournier, L., Ouakil, L., Soulat, G., & Teixeira, P. G. (2026). Reporting efficiency in diagnostic imaging: Can plug-and-play general-purpose large language models outperform conventional speech recognition? European Radiology. https://doi.org/10.1007/s00330-026-12524-5
Denecke, K., Lopez-Campos, G., Rivera-Romero, O., & Gabarron, E. (2025). The Unexpected Harms of Artificial Intelligence in Healthcare: Reflections on Four Real-World Cases. Healthcare of the Future 2025. https://doi.org/10.3233/shti250219
Dymbe, S., Siniscalchi, S. M., Svendsen, T., & Salvi, G. (2025). Using Cross-Attention for Conversational ASR over the Telephone. Text, Speech, and Dialogue, 394–405. https://doi.org/10.1007/978-3-032-02548-7_33
Earle, R., Leedie, F., Littlewood, R., Nalatu, S., Yusif, S., & Walker, J. L. (2026). Mixed Methods Evaluation of “Cooking Monsters”: An Empowerment-Focussed Aboriginal and Torres Strait Islander Adolescent Nutrition Programme. Journal of Primary Care &Amp; Community Health, 17. https://doi.org/10.1177/21501319261419935
Earnhardt, A. V., Riordan, P. S., Kahlon, C. H., Nwajei, F. I., & McCants, K. M. (2026). Artificial Intelligence in Mental Health. Nursing Clinics of North America, 61(1), 101–111. https://doi.org/10.1016/j.cnur.2025.09.011
Efstathiadis, G., Yadav, V., & Abbas, A. (2025). LLM-based speaker diarization correction: A generalizable approach. Speech Communication, 170, 103224. https://doi.org/10.1016/j.specom.2025.103224
El Hajal, K., Hermann, E., Kulkarni, A., & Magimai.-Doss, M. (2025). Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR. 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 1–5. https://doi.org/10.1109/icasspw65056.2025.11011033
Englander, H., Rolland, B., & Jauffret-Roustide, M. (2025). “Un attracteur de soins”: Bringing People Into Care. Interdisciplinary Health Care Professionals’ Attitudes Toward Opioid Agonist Therapy in France: a Qualitative Study with Implications for the United States. Substance Use &Amp; Addiction Journal, 46(3), 643–653. https://doi.org/10.1177/29767342251324325
Evans, T., & Hogg, J. (2025). NHS England guidance on AI scribes. British Dental Journal, 239(3), 154. https://doi.org/10.1038/s41415-025-9061-0
Faila Nadhifatul Aryza, & Hidayat, S. (2026). Development of Hybrid Feature Based Models for Dysarthric Speech Recognition. International Journal of Online and Biomedical Engineering (iJOE), 22(04). https://doi.org/10.3991/ijoe.v22i04.59849
Flynn, R., & Ragni, A. (2026). Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing, 34, 910–920. https://doi.org/10.1109/taslpro.2026.3658246
Ganet, A. (2025). Compilation d’un corpus oral et numérique en anglais de la mode et de la muséologie : difficultés de transcription et stratégies de correction dans le corpus VidEx. ASp, 88, 253–280. https://doi.org/10.4000/15c2i
Gani, U. A. S., Madankar, A., Shinde, A., Sonwane, A., & Durbade, S. (2025). AI-Associate: A Lightweight Architecture for Conversational Agents. 2025 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0, 1–8. https://doi.org/10.1109/otcon65728.2025.11070569
Ghosh, R., Asaadi, S., Karn, S. K., Ullaskrishnan, U., & Farri, O. (2026). Healthcare in the Age of LLMs. Artificial Intelligence and Large Language Models, 121–153. https://doi.org/10.1201/9781003492252-5
Gijón Flores, Á., Bolaños Peño, C., Llumiguano Solano, H., Fernández-Bermejo Ruiz, J., Jesús Villanueva Molina, F., & Rincón Calle, F. (2026). From Voice to Shell: A SLM-Based Assistant for IoT Maintenance Tasks on the Edge. IEEE Internet of Things Journal, 13(2), 3000–3012. https://doi.org/10.1109/jiot.2025.3632638
Grubert, J., Schmalstieg, D., & Dickhaut, K. (2025). Towards Supporting Literary Studies Using Virtual Reality and Generative Artificial Intelligence. 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 147–149. https://doi.org/10.1109/vrw66409.2025.00037
Han, Y., Lam, J. C. k., Li, V. O. k., & Cheung, L. Y. L. (2025). An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations. Findings of the Association for Computational Linguistics: EMNLP 2025, 14977–14990. https://doi.org/10.18653/v1/2025.findings-emnlp.809
Harvey, E., Kizilcec, R. F., & Koenecke, A. (2025). A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025–2039. https://doi.org/10.1145/3715275.3732137
Hassanat, A. B., Tarawneh, A. S., Al-khlifeh, E., Almahadin, A. O., Alghamdi, M., Almohammadi, K., Alrashidi, M., Alhasanat, L. A., & Al-Nawafleh, A. H. (2026). Machine Learning-Based Detection of Parkinson’s Disease From Arabic Speech: A Cross-Linguistic Validation Study. Journal of Central Nervous System Disease, 18. https://doi.org/10.1177/11795735261448278
Hidayat, A. (2025). Multi-Label Classification of Indonesian Voice Phishing Conversations: A Comparative Study of XLM-RoBERTa and ELECTRA. Journal of Applied Data Sciences, 6(3), 2177–2191. https://doi.org/10.47738/jads.v6i3.858
Hlongwane, K., Mthethwa, S., & Manqele, L. (2025). Enhancing Diversity in Inclusive Learning Classroom Using OpenAI Whisper Model. 2025 IST-Africa Conference (IST-Africa), 1–9. https://doi.org/10.23919/ist-africa67297.2025.11060520
Ho, P. H., Bălan, D. A., Heylen, D. K. J., & Truong, K. P. (2025). Enhancing Transcripts of Open-Source Automatic Speech Recognition Models Through Fine-Tuning with Laughter and Speech-Laugh. Interspeech 2025, 4513–4517. https://doi.org/10.21437/interspeech.2025-2193
Jha, A., & Gloor, P. A. (2025). Do What You Say—Computing Personal Values Associated with Professions Based on the Words They Use. Algorithms, 18(2), 72. https://doi.org/10.3390/a18020072
John, T., & Möller, A. L. (2024). On-Demand Internationalization for Learning Management System Moodle. HCI International 2024 Posters, 95–102. https://doi.org/10.1007/978-3-031-61953-3_11
Jothieswari, J., & Suguna, S. (2025). Transformer-Based Whisper Classifier for Enhanced Dysarthric Speech Recognition Using UA Speech Dataset. 2025 IEEE International Conference on Contemporary Computing and Communications (InC4), 1–6. https://doi.org/10.1109/inc465408.2025.11256452
K, M. R., M, R. S., K, A., & Indrason, N. (2026). Speech Emotion Recognition Using Deep Learning by Hybridizing CNN-BiLSTM and Attention Mechanism. 2026 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), 1–6. https://doi.org/10.1109/iatmsi68868.2026.11466131
Kamran, H., Dorrani, A., & Kamran, H. (2026). The Algorithmic Reconfiguration of Qualitative Inquiry: Navigating AI-Driven Efficiency and Interpretive Richness. AI, Society and Digital Transformation, 250–261. https://doi.org/10.1007/978-3-032-13116-4_20
King, E., Yu, H., Vartak, S., Jacob, J., Lee, S., & Julien, C. (2025). Teaching Things To Think: Bootstrapping Local Reasoning for Smart(er) Devices. 2025 IEEE International Conference on Pervasive Computing and Communications (PerCom), 78–88. https://doi.org/10.1109/percom64205.2025.00027
Kumar, S., Tejas, B. N., Hithaish, Budale, R. R., & Kamath, P. R. (2026). An Audio-Visual Speech Separation and Personalized Keyphrase Detection in Noisy Environments. Proceedings of the Third Congress on Control, Robotics, and Mechatronics, 297–308. https://doi.org/10.1007/978-981-96-8126-6_21
Larson, M. (2026). Oral History Encounters AI: An Exploration of Core Principles and Best Practices, Context and Consent. The Oral History Review, 53(1), 93–110. https://doi.org/10.1080/00940798.2026.2625663
LeFevre, G., Hosier, J., Zhou, Y., & Gurbani, V. K. (2025). LLM Selection: Improving ASR Transcript Quality via Zero-Shot Prompting. SoutheastCon 2025, 1440–1445. https://doi.org/10.1109/southeastcon56624.2025.10971586
Li, J., Li, Q., Gong, R., Wang, L., & Wu, S. (2025). Our Collective Voices: The Social and Technical Values of a Grassroots Chinese Stuttered Speech Dataset. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2768–2783. https://doi.org/10.1145/3715275.3732179
Linke, J., Winkler, J., & Schuppler, B. (2025). Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker. Interspeech 2025, 3199–3203. https://doi.org/10.21437/interspeech.2025-1824
Litterer, B. R., Jurgens, D., & Card, D. (2025). Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 25132–25154. https://doi.org/10.18653/v1/2025.acl-long.1222
Looi, J. C., Allison, S., Bastiampillai, T., Reutens, S., & Looi, R. C. (2025). Illusions of intelligence, connection and reality: Perils of large-language AI models for people with severe mental illness. Australasian Psychiatry, 34(1), 5–7. https://doi.org/10.1177/10398562251380544
Luchkina, E., Simon, L. R., & Waxman, S. R. (2025). Catching up with iCatcher: Comparing analyses of infant eye tracking based on trained human coders and iCatcher+ automated gaze coding software. Behavior Research Methods, 57(6). https://doi.org/10.3758/s13428-025-02683-6
Ma, Q., Zhang, W., Chen, J., Jia, J., Wang, R., & Yan, D. (2026). DUAP: Disentanglement-Based Universal Adversarial Perturbations for Robust Multilingual Speech Privacy Protection. IEEE Transactions on Information Forensics and Security, 21, 3703–3718. https://doi.org/10.1109/tifs.2026.3671687
Maheshwari, G., Ivanov, D., Johannet, T., & El Haddad, K. (2025). ASR Benchmarking: Need for a More Representative Conversational Dataset. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/icassp49660.2025.10888906
Małysiak-Mrozek, B., Wojaczek, P., Tokarz, K., Sunderam, V., Mrozek, D., & Lamirel, J.-C. (2025). Automatic Help Summoning Through Speech Analysis on Mobile Devices. Computational Science – ICCS 2025 Workshops, 18–32. https://doi.org/10.1007/978-3-031-97573-8_2
Mansoor, H., Abdullah, U., Adil, S., Jamil, A., Hameed, A. A., & Soleimani, F. (2025). Mitigating Hallucinations in Speech Recognition Systems for Noisy Data. 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI), 1–5. https://doi.org/10.1109/icmi65310.2025.11141067
Massenon, R., Gambo, I., Khan, J. A., Agbonkhese, C., & Alwadain, A. (2025). ”My AI is Lying to Me”: User-reported LLM hallucinations in AI mobile apps reviews. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-15416-8
Mawalim, C. O., Okada, S., & Unoki, M. (2024). Are Recent Deep Learning-Based Speech Enhancement Methods Ready to Confront Real-World Noisy Environments? Interspeech 2024, 1735–1739. https://doi.org/10.21437/interspeech.2024-129
McGonigle, E., VanDam, M., Wilkinson, C., & Johnson, K. T. (2024). Benchmarking Automatic Speech Recognition Technology for Natural Language Samples of Children With and Without Developmental Delays. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 1–5. https://doi.org/10.1109/embc53108.2024.10782773
Minamoto, A., Zhang, Z., & Wang, Z. (2025). Solving Language and Culture Barriers on Medical Care: An Inclusive Medical Interpretation Agent-Medylan. Adjunct Proceedings of the 27th International Conference on Mobile Human-Computer Interaction, 1–6. https://doi.org/10.1145/3737821.3748523
Mukmin, A., Adnan, Jalil, A., Pallawabonang, M., Prakasa, E., & Yuyun. (2024). Leveraging the Pre-Trained Whisper Model and Levenshtein Distance for Audio Buginesse Transcription. 2024 beyond Technology Summit on Informatics International Conference (BTS-I2C), 589–594. https://doi.org/10.1109/bts-i2c63534.2024.10942151
N, K., Andanur, P., Udupa E P, P., Airani, S. S., & Dattathreya. (2025). AI-Powered Voice Assistants for Accessible E-Commerce: A Review of ASR, RAG, LLMs and TTS. 2025 5th International Conference on Evolutionary Computing and Mobile Sustainable Networks (ICECMSN), 707–712. https://doi.org/10.1109/icecmsn68058.2025.11383381
Nguyen, T., & Tran, H.-D. (2025). AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1–7. https://doi.org/10.1109/asru65441.2025.11434656
Nygren, T., Samuelsson, M., Hansson, P.-O., Efimova, E., & Bachelder, S. (2025). AI Versus Human Feedback in Mixed Reality Simulations: Comparing LLM and Expert Mentoring in Preservice Teacher Education on Controversial Issues. International Journal of Artificial Intelligence in Education, 35(5), 2856–2888. https://doi.org/10.1007/s40593-025-00484-8
O’Connor Russell, S., & Harte, N. (2025). Visual Cues Support Robust Turn-taking Prediction in Noise. Interspeech 2025, 1073–1077. https://doi.org/10.21437/interspeech.2025-668
O’Kane, R., Stonehouse-Smith, D., Ota, L. C. U., Patel, R., Johnson, N., Slipper, C., Seehra, J., Papageorgiou, S. N., & Cobourne, M. T. (2025). Transcription Accuracy of Automatic Speech Recognition for Orthodontic Clinical Records. Journal of Dental Research. https://doi.org/10.1177/00220345251382452
Olagoke, A., Jacobson, L. T., Babajide, O., & Qi, Z. (2026). Building Safe AI Chatbots for Rural Mothers Seeking Breastfeeding Support: Understanding Hallucinations and How to Mitigate Them. Social Sciences, 15(2), 119. https://doi.org/10.3390/socsci15020119
Pan, E., Choi, A. S. G., Ter Hoeve, M., Seto, S., & Koenecke, A. (2025). Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks. Findings of the Association for Computational Linguistics: EMNLP 2025, 20882–20893. https://doi.org/10.18653/v1/2025.findings-emnlp.1139
Pan, L., Jiang, C., Hou, G., & Gao, Y. (2025). Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations. 2025 IEEE International Conference on Multimedia and Expo (ICME), 1–6. https://doi.org/10.1109/icme59968.2025.11209855
Pande, A., & Mishra, D. (2024). Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning. Biomimetics, 9(7), 391. https://doi.org/10.3390/biomimetics9070391
Park, H. I., Solon, M., & Lee, K. (2025). Understanding proficiency assessment practices in SLA research. Studies in Second Language Acquisition, 47(5), 1205–1229. https://doi.org/10.1017/s0272263125101058
Pengsiri, P., Malang, C., & Pongpatcharatorntep, D. (2026). Text Analytics for Identifying the Competitiveness of Northern Thai SMEs’ Products. 2026 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering (ECTI DAMT &Amp;Amp; NCON), 466–471. https://doi.org/10.1109/ectidamtncon67592.2026.11460023
Pisal, T., Hajare, R., Verma, P., Gote, P. M., Jiet, M., & Chaudhri, S. N. (2025). AI-Driven Transcription and Analysis for Medical Conversations. 2025 International Conference on Machine Learning and Autonomous Systems (ICMLAS), 1228–1234. https://doi.org/10.1109/icmlas64557.2025.10969001
Prajwal, T. S., Desu, L., Kumari, N., & Kumar, R. (2025). Content Moderation and File Anonymization using Agentic AI. 2025 5th International Conference on Electrical, Computer and Energy Technologies (ICECET), 1–6. https://doi.org/10.1109/icecet63943.2025.11472200
Prasad, C., Swamy Rao, V. G., J, G., & Naidu, R. C. A. (2026). An ASR Transformer-Based Model for Kannada Speech-to-Text Transcription. Journal of Artificial Intelligence and Technology. https://doi.org/10.37965/jait.2026.0935
Prem, D., Raimond, K., & Jeyabose, A. (2025). Investigating the Impact of Feature Extraction Techniques for Classification and Synthesis of Dysarthric Speech. Circuits, Systems, and Signal Processing. https://doi.org/10.1007/s00034-025-03366-5
R, V., D, H., & Anand, L. D. V. (2024). Leveraging OpenAI Whisper Model to Improve Speech Recognition for Dysarthric Individuals. 2024 Asia Pacific Conference on Innovation in Technology (APCIT), 1–5. https://doi.org/10.1109/apcit62007.2024.10673628
Rangappa, P., Zuluaga-Gomez, J., Madikeri, S., Carofilis, A., Prakash, J., Burdisso, S., Kumar, S., Villatoro-Tello, E., Nigmatulina, I., Motlicek, P., Pandia, K., & Ganapathiraju, A. (2025). Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/icassp49660.2025.10888138
Ranjith, A., G, J. Lal., & B, P. (2025). Robust Dysarthria Detection and Severity Classification Framework Using Multifeature Speech Representations and Deep Learning Models. 2025 IEEE 22nd India Council International Conference (INDICON), 1–6. https://doi.org/10.1109/indicon68490.2025.11392854
Rao, N., O’Riordan, S., & Coulis, J. (2025). AI and labor: Captioning library audiovisual content with Whisper. IFLA Journal, 51(3), 803–813. https://doi.org/10.1177/03400352241310534
Rastall, D. P. W., & Rehman, M. (2025). Why Clinical Trials Will Fail to Ensure Safe AI. Journal of Medical Systems, 49(1). https://doi.org/10.1007/s10916-025-02231-x
Reutens, S., Dandolo, C., Looi, R. C. H., Karystianis, G. C., & Looi, J. C. L. (2024). The uses and misuses of artificial intelligence in psychiatry: Promises and challenges. Australasian Psychiatry, 33(1), 9–11. https://doi.org/10.1177/10398562241280348
Rixen, J., & Renz, M. (2025). Efficient Speech Separation with Differencing. Ecai 2025. https://doi.org/10.3233/faia251307
Rodger, D., & O’Connor, S. (2025). Using artificial intelligence in health research. Evidence Based Nursing, 28(4), 203–205. https://doi.org/10.1136/ebnurs-2025-104287
Rohanian, M., Hüppi, R., Nooralahzadeh, F., Dannecker, N., Pauli, Y., Surbeck, W., Sommer, I., Hinzen, W., Langer, N., Krauthammer, M., & Homan, P. (2026). Uncertainty modeling in multimodal speech analysis across the psychosis spectrum. Npj Digital Medicine, 9(1). https://doi.org/10.1038/s41746-025-02309-3
Rosas-Smith, J., Bartelds, M., Huang, R., García-Perera, L. P., Livescu, K., Jurafsky, D., & Field, A. (2025). Constructing Datasets From Public Police Body Camera Footage. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/icassp49660.2025.10888621
Russell, S. O., & Harte, N. (2025). Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction. Findings of the Association for Computational Linguistics: ACL 2025, 209–221. https://doi.org/10.18653/v1/2025.findings-acl.12
Şahin, M. F., Topkaç, E. C., Doğan, Ç., Şeramet, S., Tuncer, F. B., Köroğlu, M. S., Orbeği, O., & Yazici, C. M. (2025). AI-assisted transcription of YouTube videos on penile enlargement: analysis of their text quality and readability. Sexual Medicine, 13(2). https://doi.org/10.1093/sexmed/qfaf023
Saranya, D. S., Anirudh, D., Reddy, K. P., Kumar, N. S. S. S. G., & Jabbar, M. A. (2025). TalkGenie: Intent and Sentiment-Aware Multilingual Chatbot. 2025 International Conference on Intelligent Communication Networks and Computational Techniques (ICICNCT), 1–6. https://doi.org/10.1109/icicnct66124.2025.11232647
Schock, B., Kuraner, S. E., Westphal, M., & Bonanno, G. A. (2025). How “algospeak” is changing online discourse on suicide and self-harm: A pilot study. Traumatology. https://doi.org/10.1037/trm0000581
Shailja, S., Williams, T., & Pandey, A. Self-efficacy of high school students after an AI-focused pre-college program: A two year impact study (Fundamental). 2025 ASEE Annual Conference &Amp; Exposition Proceedings. https://doi.org/10.18260/1-2--57638
Shamsian, A., Navon, A., Glazer, N., Hetz, G., & Keshet, J. (2024). Keyword-Guided Adaptation of Automatic Speech Recognition. Interspeech 2024, 732–736. https://doi.org/10.21437/interspeech.2024-391
Shao, A. (2025). New sources of inaccuracy? A conceptual framework for studying AI hallucinations. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-182
Sharon, T. (2025). Technosolutionism and the empathetic medical chatbot. AI &Amp; SOCIETY, 41(1), 289–306. https://doi.org/10.1007/s00146-025-02441-4
Sharp, K., Ouellette, R. R., Singh, R. S. R., DeVito, E. E., Kamdar, N., de la Noval, A., Murthy, D., & Kong, G. (2025). Generative artificial intelligence and machine learning methods to screen social media content. PeerJ Computer Science, 11, e2710. https://doi.org/10.7717/peerj-cs.2710
Sherstinova, T. Y., & Petrova, I. (2024). ESC Corpus of Spoken Russian: Everyday Student Conversations Captured Through Continuous Speech Recording in Natural Communicative Environments. Speech and Computer, 151–162. https://doi.org/10.1007/978-3-031-77961-9_11
Shubash, J., Athmika, S., Deekshitha, S., Meghana, K. N., & Varsha, S. (2025). CareLoop – AI-Driven Approach for Advancing Efficiency, Accuracy, and Accessibility in Oral Healthcare Systems. 2025 Third International Conference on Emerging Applications of Material Science and Technology (ICEAMST), 1305–1309. https://doi.org/10.1109/iceamst67459.2025.11336036
Sloane, M., Schellmann, H., Mei, K. X., Choi, A. S. G., & Koenecke, A. (2026). The case for stakeholder-driven AI auditing in automatic speech recognition. Nature Machine Intelligence, 8(4), 493–494. https://doi.org/10.1038/s42256-026-01207-x
Sohrabi, R. (2026). When AI fails to listen: convergent failure patterns across speech impairments and low-resource speech in ASR. AI &Amp; SOCIETY. https://doi.org/10.1007/s00146-026-03073-y
Sridhar, C., & Wu, S. (2025). J-j-j-just Stutter: Benchmarking Whisper’s Performance Disparities on Different Stuttering Patterns. Interspeech 2025, 3753–3757. https://doi.org/10.21437/interspeech.2025-2700
Srivastava, M., Ferro, M., Pirrelli, V., & Coro, G. (2026). Enhancing token boundary detection in disfluent speech. Intelligent Systems with Applications, 29, 200614. https://doi.org/10.1016/j.iswa.2025.200614
Steinert, P., Wagenpfeil, S., Frommholz, I., & Hemmje, M. L. (2026). A Reference Model for the Analysis and Indexing of Metaverse Recordings for Information Retrieval. Big Data and Cognitive Computing, 10(3), 85. https://doi.org/10.3390/bdcc10030085
Stoykova, R., Porter, K., & Beka, T. (2024). The AI Act in a law enforcement context: The case of automatic speech recognition for transcribing investigative interviews. Forensic Science International: Synergy, 9, 100563. https://doi.org/10.1016/j.fsisyn.2024.100563
Susaiyah, A., & Sidorova, N. (2025). Zero-Shot Approaches for the Extraction of Event Logs from Medical Notes. Research Challenges in Information Science, 435–451. https://doi.org/10.1007/978-3-031-92474-3_26
Tan, T., Chen, X., Le, X., Fan, W., Xia, X., Huang, C., & Lu, J. (2025). CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition. Interspeech 2025, 3309–3313. https://doi.org/10.21437/interspeech.2025-1705
Taneja, D., & Bhatt, S. (2025). Enhancing Hindi Speech Recognition Using Regularization Techniques. 2025 7th International Conference on Artificial Intelligence and Speech Technology (AIST), 1–6. https://doi.org/10.1109/aist68591.2025.11441569
Taubitz, F.-S., Sehn, L. H. A., & Alpers, G. W. (2025). Language Model, bitte zum Diktat: Transkription, Sprecher*innenzuordnung und datenschutzkonforme Verarbeitung von Therapiegesprächen mit KI. Verhaltenstherapie, 1–18. https://doi.org/10.1159/000549546
Temel, M. H., Erden, Y., & Bağcıer, F. (2024). Assessing Patients Perception: Analyzing the Quality, Reliability, Comprehensibility, and the Mentioned Medical Concepts of Traumatic Brain Injury Videos on YouTube. World Neurosurgery, 185, e907–e914. https://doi.org/10.1016/j.wneu.2024.02.150
Tolle, H., Castro, M. d. M., Wachinger, J., Putri, A. Z., Kempf, D., Denkinger, C. M., & McMahon, S. A. (2024). From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool. BMC Research Notes, 17(1). https://doi.org/10.1186/s13104-024-06749-0
Trabelsi, A., Werey, L., Warichet, S., & Helbert, E. (2024). Is Noise Reduction Improving Open-Source ASR Transcription Engines Quality? Proceedings of the 16th International Conference on Agents and Artificial Intelligence, 1221–1228. https://doi.org/10.5220/0012457100003636
Ubur, S. D., Adewale, S., Chandrashekar, N. D., Akli, E. K., & Gracanin, D. (2025). Interpretive Caption: Real-Time Vocal Emotion Cues for DHH Users. Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility, 1–5. https://doi.org/10.1145/3663547.3759697
Viola, T. W., de Carvalho, M. T., Padoin, R. C. P. K., Kampff, A. J. C., & Padoin, A. V. (2025). Evaluation using artificial intelligence shows post pandemic differences in oral reading fluency between Brazilian public and private school students. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-15644-y
Waheed, A., Atwany, H., Singh, R., & Raj, B. (2025). On the Robust Approximation of ASR Metrics. Findings of the Association for Computational Linguistics: ACL 2025, 23119–23146. https://doi.org/10.18653/v1/2025.findings-acl.1187
Wang, Y., Alhmoud, A., Alsahly, S., Alqurishi, M., & Ravanelli, M. (2025). Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down. Interspeech 2025, 3414–3418. https://doi.org/10.21437/interspeech.2025-201
Weilinghoff, A. (2025). Transcribing Diverse Voices: Using Whisper for ICE corpora. Interspeech 2025, 3359–3363. https://doi.org/10.21437/interspeech.2025-1980
Wepner, S., Eckert, L., Kubin, G., & Schuppler, B. (2025). What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles. Interspeech 2025, 2325–2329. https://doi.org/10.21437/interspeech.2025-757
Wolfartsberger, J., & Niedermayr, D. (2025). Breaking the Code Barrier: Analyzing the Effectiveness of LLMs in Natural Language Interfaces for Collaborative Robotics. 2025 6th International Conference on Artificial Intelligence, Robotics and Control (AIRC), 274–280. https://doi.org/10.1109/airc64931.2025.11077511
Woo, B. F. Y., Song, J., Middleton, E., Fijačko, N., & Cato, K. (2025). Teaching Critical Thinking in the Age of AI: Safeguarding Clinical Reasoning in Healthcare Documentation. International Nursing Review, 72(3). https://doi.org/10.1111/inr.70102
Wu, C., Pan, Y., Wu, H., & Ning, L. (2025). Integrating Speech Recognition into Intelligent Information Systems: From Statistical Models to Deep Learning. Informatics, 12(4), 107. https://doi.org/10.3390/informatics12040107
Wu, S., Wenzel, K., Li, J., Li, Q., Pradhan, A., Kushalnagar, R., Lea, C., Koenecke, A., Vogler, C., Hasegawa-Johnson, M., Su, N. M., & Bernstein Ratner, N. (2025). Speech AI for All: Promoting Accessibility, Fairness, Inclusivity, and Equity. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1–6. https://doi.org/10.1145/3706599.3706746
Wurm, B. (2026). Managing Digital Trace Data Research Projects. Digital Trace Data Research in Information Systems, 3–21. https://doi.org/10.1007/978-3-032-05497-5_1
Yasu Bharathi, V. R., K, P., Madasamy, K. S., Goldwin, K. R., Elangovan, A., & Kumar, M. V. (2025). Edge Deployed Speech to Text Models for Real Time Classroom Feedback in Interactive Linguistics Pronunciation Labs. 2025 2nd International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), 1–7. https://doi.org/10.1109/iceconf65644.2025.11379572
Ye, Z., Lian, J., Gupta, A., Zhou, X., Li, H., Patel, K., Park, H. J., Zhou, D., Guo, C., Li, S., Wang, S., Zhou, I., Cho, C. J., Ezzes, Z., Vonk, J. M. J., Morin, B. T., Bogley, R., Wauters, L., Miller, Z. A., et al. (2025). LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1–8. https://doi.org/10.1109/asru65441.2025.11434691
Yue, Z., Loweimi, E., Cvetkovic, Z., Barker, J., & Christensen, H. (2026). Raw acoustic-articulatory multimodal dysarthric speech recognition. Computer Speech &Amp; Language, 95, 101839. https://doi.org/10.1016/j.csl.2025.101839
Zarco-Alpuente, A., Álvarez, O., Castillo, I., & Samper-García, P. (2026). Digital disconnection in adolescent sport: A reflexive thematic analysis of athletes’ experiences. Psychology of Sport and Exercise, 85, 103098. https://doi.org/10.1016/j.psychsport.2026.103098
Zhang, L., Wu, S., & Wang, Z. (2025). LoRA-INT8 Whisper: A Low-Cost Cantonese Speech Recognition Framework for Edge Devices. Sensors, 25(17), 5404. https://doi.org/10.3390/s25175404
Zhang, S., Parcollet, T., Van Dalen, R., & Bhattacharya, S. (2025). Benchmarking Rotary Position Embeddings for Automatic Speech Recognition. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1–7. https://doi.org/10.1109/asru65441.2025.11434788
Zhang, S., Xu, J., & Alvero, A. (2025). Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization. Sociological Methods &Amp; Research, 54(3), 1197–1242. https://doi.org/10.1177/00491241251327130
Zhang, Y., De Valck, T., & Scharenborg, O. (2026). Speech recognition performance disparities between Dutch diverse speaker groups. Phonetica. https://doi.org/10.1515/phon-2025-0061
Zhang, Y., Yu, S., & Lai, J. (2025). Efficient Deployment of Large Speech Recognition Models on GPU. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1–4. https://doi.org/10.1109/asru65441.2025.11434664
Zhao, R., Choi, A. S. G., Koenecke, A., & Rameau, A. (2024). Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech. The Laryngoscope. https://doi.org/10.1002/lary.31713
Zhao, X., & Van Hamme, H. (2025). Refining Transcripts with TV Subtitles by Prompt-Based Weakly Supervised Training of ASR. 2025 33rd European Signal Processing Conference (EUSIPCO), 346–350. https://doi.org/10.23919/eusipco63237.2025.11226307
Zhiyenbayev, A., Abdrakhmanov, R., Varol, H. A., & Yazici, A. (2025). Multi-Modal Vision and Language Models for Real-Time Emergency Response. 2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI), 1199–1206. https://doi.org/10.1109/ictai66417.2025.00175
Ziegler, M., Lothian, S., O’Neill, B., Anderson, R., & Ota, Y. (2025). AI language models could both help and harm equity in marine policymaking. Npj Ocean Sustainability, 4(1). https://doi.org/10.1038/s44183-025-00132-7
Zusag, M., Wagner, L., & Thallinger, B. (2024). CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions. Interspeech 2024, 1265–1269. https://doi.org/10.21437/interspeech.2024-731
Barnes, C. A., & Najafi, B. (2025). The Precision Aging® Network: Creating a Roadmap for Healthy Brain Aging. Gerontology, 1–7. https://doi.org/10.1159/000548488
Barrett, T. S., Wynn, C. J., Eijk, L., Tetzloff, K. A., & Borrie, S. A. (2025). Autoscribe: An automated tool for creating transcribed TextGrids from audio-recorded conversations. Behavior Research Methods, 57(12). https://doi.org/10.3758/s13428-025-02850-9
Cao, Y. (2025). Performance Evaluation of Whisper-Series Speech Transcription Models on Raspberry Pi. Proceedings of the Tenth ACM/IEEE Symposium on Edge Computing, 1–7. https://doi.org/10.1145/3769102.3774244
Kaplan, D. M., Alvarez, S. J. A., Palitsky, R., Choi, H., Clifford, G. D., Crozier, M., Dunlop, B. W., Grant, G. H., Greenleaf, M. N., Johnson, L. M., Maples-Keller, J., Levin-Aspenson, H. F., Mascaro, J. S., McDowall, A., Pozzo, N. S., Raison, C. L., Zarrabi, A. J., Rothbaum, B. O., & Lam, W. A. (2025). Fabla: A voice-based ecological assessment method for securely collecting spoken responses to researcher questions. Behavior Research Methods, 57(9). https://doi.org/10.3758/s13428-025-02777-1
Min, S., Yeum, T.-S., Shin, D., Rhee, S. J., Lee, H., Lee, H.-S., Park, S., Lee, J., & Ahn, Y. M. (2025). Automated Speech Analysis for Screening and Monitoring Bipolar Depression: Machine Learning Model Development and Interpretation Study. JMIR Medical Informatics, 13, e79093. https://doi.org/10.2196/79093
Naffah, A., Pfeifer, V. A., & Mehl, M. R. (2025). Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI’s Whisper. Gerontology, 71(5), 417–424. https://doi.org/10.1159/000545244