How to create a customized speech to text model

Posted on Thursday, February 8th, 2018 by Ronja Slierendrecht
speech to text

A four step guide for non-techies

So, you and your team have decided to start working with speech to text software. Great!

To start of right, it’s important to define your exact needs and project objectives. There are different approaches to choose from, and the type of content, your goals, the type of data, time and budget are all factors to consider before starting the project.

At Zoom Media we believe that customization is the key to successful implementation of speech to text software. The ‘one size fits all’ approach does not fit most organizations that deal with industry-specific language, a certain type of content. However, there is a standard routine that is effective for almost all organizations that are implementing speech to text software. We’ve written down the four steps that most organizations take. It’s good to share some know-how of this process with all your team members, not just the developers, because working together is also key to a successful implementation.

1 Choose a basic model

Choose a basic model that fits your tech environment and companies standards. There are a lot of open source models out there, and basic datasets are also available for most languages. Ask your future customers what factors are important to them and choose a language model that fits their needs.

2 Create a corpus

Create a ‘corpus’ to train your model. A corpus is a plain text document that uses terminology from a certain industry or context. Your service will build a vocabulary for your ‘custom made model’ by extracting terms from the corpora that don’t exist in the basic dataset. You can add as many corpora to your custom model as you like. For example; if you add a corpora that belongs to banks you can also add a medical dataset without causing any problems.

3 Add relevant words

When you’ve added the corpora you can even add more words individually, think of important names, specific new trending topics or other words that can change over time. We keep repeating this step of the process continuously and keep adding new important words so our models are always up-to-date.

4 Start training!

Let’s put the model to the test and start running the words. The training routine is very important for the quality of your model and the results. We check the output, correct all mistakes manually and train the system with the corrected output. This is part of what we call ‘The Human Feedback Loop’.

Are you about to start working with speech to text but in need of some professional support? Kick off your project with the right help and ensure you make the right investment, train the system properly and add value to your data.