This repository contains a production-ready, local transcription workflow that leverages OpenAI's Whisper models to overcome the limitations of cloud-based ASR services. While platforms like ChatGPT offer convenience, they impose file size restrictions, lack transparency for reproducible research, and pose significant privacy risks under regulations like GDPR. Our solution addresses these challenges by providing a self-contained system that ensures complete data sovereignty. It is designed for unlimited scale, supporting batch operations on high-performance computing (HPC) clusters with GPU acceleration. The workflow incorporates advanced quality control features, including algorithms to detect and remove AI-generated repetitions and context-aware name masking for privacy. It also offers speaker diarisation via pyannote.audio and a flexible audio enhancement pipeline. Implemented as a single, maintainable Python script, this system offers a robust, reproducible, and secure alternative for academic and enterprise transcription needs.
If you use this workflow in your research, please cite:
Bernabeu, P. (2025). Secure and scalable speech transcription for local and HPC (Version 1.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.17624830
The recommended BibTeX entry is:
@misc{secure_local_HPC_speech_transcription,
author = {Bernabeu, Pablo},
title = {Secure and scalable speech transcription for local and {HPC}},
year = {2025},
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.17624830},
url = {https://doi.org/10.5281/zenodo.17624830}
}