WebVTT caption transcription app

This open-source, R-based web application allows the conversion of video captions (subtitles) from the Web Video Text Tracks (WebVTT) Format into plain texts. For this purpose, users upload a WebVTT file with the extension .vtt or .txt (examples available here and here).

Copy
WEBVTT
Kind: captions
Language: en-GB
6fbb1fca-421a-4ee9-942b-cee0dc425f8a
00:00:00.760 --> 00:00:08.640
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Duis sed tempus nisi. Sed
635ad194-26db-4fb7-90d6-26bfbbaefdf3
00:00:08.640 --> 00:00:14.400
lorem diam, fermentum id hendrerit eu,
efficitur at nulla. Aenean vitae pulvinar
11fc831a-59a3-40da-9c5c-5a8556a40a45
00:00:14.400 --> 00:00:15.540
magna. Pellentesque metus nisi, elementum
0826c1e2-0c4c-4517-b1a8-0dc745d603e3
00:00:15.540 --> 00:00:28.760
nisi, mattis fermentum nulla. Donec
quis varius metus, suscipit vulputate neque.
view raw Example_subtitles_1.vtt delivered with ❤ by emgithub

Download VTT file

When the caption file is uploaded to the web application, metadata such as timestamps are automatically removed, and the text is formatted into a paragraph. The result is displayed on the website, and can be downloaded as .docx and .txt documents. Overall, this application serves to improve the accessibility of video captions.

The data is only available to the user, and is deleted when the website is closed.

🌐 Web application

The application can be used below, or it can be opened separately here.

Programming details

The source code is available on Github, where the app can be extended via pull requests. Questions and suggestions can be submitted as issues or emailed to . The licence is Creative Commons Attribution 4.0 International.

The core of the application is in the index.Rmd script, which uses ‘regular expressions’ to process the VTT file.

In turn, the script above draws on the one below to enable the download of .docx documents. Last, the latter script draws on this Word template.