AWS Transcribe (Speech-To-Text) and Automation: Everything You Need to Know

In today’s times, viewers demand an immersive and inclusive streaming experience. Accessibility and engagement have become the key to streaming success. There was a time when captions were optional in OTT videos. But today, captions and subtitles are no longer just nice-to-haves, they’re essential for accessibility and a truly inclusive viewing experience.

At LinqTV, we are committed to making your videos accessible and enjoyable for everyone. That’s why we leverage the power of AWS Transcribe to deliver real-time speech-to-text capabilities.

In this blog, we’ll explore how AWS Transcribe works behind the scenes, transforming spoken words into on-screen text. We’ll discuss how you can automate the process and how it enhances the viewing experience for all audiences. So, let’s get started!

AWS Transcribe is a service that converts spoken words into text. This tool offers a convenient and effective method for transcribing audio content into written form. It supports a variety of audio formats, making it suitable for various applications like transcribing customer service calls, interviews, and meetings.

To utilize Transcribe for tasks like meeting transcription, you must have an AWS account. You can access Transcribe through the AWS Management Console, AWS Command Line Interface (CLI), or AWS SDKs. Before you begin transcribing, ensure that you have the appropriate permissions to utilize the Transcribe service.

Speech-to-text software operates by listening to audio and providing an editable, verbatim transcript on a designated device. This software achieves this through voice recognition. It employs linguistic algorithms within a computer program to differentiate auditory signals from spoken words and convert those signals into text using Unicode characters.

The conversion process from speech to text involves a sophisticated machine-learning model that comprises several stages. Let’s delve deeper into how this process unfolds:

Let’s take you through some of the top features of AWS Transcribe service.

You need just a single service API for managing both on-demand and live-streaming content. Supported formats for on-demand videos include FLAC, MP3, MP4, Ogg, WebM, AMR, or WAV. For live streaming, the API supports formats such as HTTP2 and WebSocket.

The list of audio codecs supported has been sorted from best quality to worst quality below:

Enhance the accuracy of AWS Transcribe by incorporating domain-specific terminology, such as names, acronyms, and slang, using custom vocabulary and CLM. For batch processing (VoD), custom vocabulary and CLM can be utilized to achieve the highest accuracy levels.

Improve the live subtitling experience in video broadcasts and in-game chat by controlling the stabilization level of partial transcription results. This provides the flexibility to display partial sentence results instead of waiting for the entire sentence to be subtitled.

Output batch transcription works in WebVTT