Artificial Intelligence, it’s the holy grail that will solve all our problems. Machines that help people tackle all kinds of obstacles that were deemed unsolvable in the past. Simply hook your systems onto the exciting solutions relevant to your business out there and you’re good to go. Sounds good enough, but as always when something sounds too good to be true, it probably is.
To get one thing straight, Artificial Intelligence isn’t anything new, it’s basically what we would’ve called all technological innovations we’ve seen since the first computer. The basis of their functionality has always been the same, basically what they needed to do was make calculations humans had difficulty with. Having those machines learn themselves is an interesting ‘new’ development, however.
Plug and play?
‘So why isn’t it possible to simply hook A.I. onto my systems and/or processes?’ People often ask me. In some cases you can, there are generic models that are up to the task of helping out certain businesses with some of their processes. However, in most cases, generic models aren’t enough to fulfil the client’s requests to meet clients needs. In most cases, it’s necessary to train models for specific end users. Let’s take the example of speech recognition. Speech recognition, or speech to text, offers a great way to index large amounts of audio content to make it searchable. For example, broadcasters can benefit greatly from this solution. Let’s say you want your viewers to be able to look for specific fragments within a tv-show but they’re not sure where or when it was. Thanks to transcription of the entire archive of said broadcaster they can now easily find that specific moment they were looking for since they no longer have to rely on results found based on simple tagging.
So why train models?
But what if for instance, you want to make transcriptions of talks between bankers and clients? Can your generic speech to text model deliver what’s needed then? The answer is no and for two reasons. First of all, we’re not talking about ‘normal language’ in this specific case. Bankers use all kinds of industry-specific terminology which a generic speech to text model can have a lot of trouble with to transcribe. What’s needed is industry specific speech to text engines that are customized to the specific needs of this specific end user. How it works is by ingesting specific terminology to a generic model and offer this as a separate finance engine for example. The second reason might not be what you would expect, hardware. Clients sometimes underestimate the need for good hardware when it comes to speech to text. Simply recording your audio with cheap microphones you bought at some convenience store won’t cut it. Generic models work for broadcasters because language isn’t ‘complicated’ or industry-specific most of the time and they use the best hardware out there. So the training of the models and using the right hardware is critical for a good outcome in most cases.
We need humans after all
After you’ve taken all the crucial steps outlined above, you will need to keep your engines up-to-date. And for that, the only right way to achieve the best results is by placing a human in the loop, or what we call the ‘Human Feedback Loop’. Having humans manually correct the output to get a 100 percent correct transcript to feedback to your engines so it can learn from its own mistakes is the quickest and most effective way to obtain a high accuracy. But the humans you need most are the clients who are in need of technical solutions you can provide them with. They have to be sure of what solution they actually need. Speech to text only might just not be what they were actually looking for. Perhaps transcripts alone didn’t help them at all, they needed a greater understanding of their data, what they actually needed was a system that made sense of all the transcribed output for them. Instead of just a cognitive service, they were in need of cognitive analytics. But enough, for now, let’s discuss that one later on.
Do you have idea’s you want to share when it comes to Artificial Intelligence or speech to text? Drop me an e-mail and we’ll get in touch!