Customers often ask me to help them implement speech recognition solutions in their business. Having the opportunity to automate lots of boring work, obviously I’m more than happy to help. Nevertheless, in a lot of cases, I need to hit the brake however and raise some questions. For example when I find out that a manager plans to replace a well-oiled ‘human machine’ by speech recognition or add speech recognition to an already complex workflow without overthinking the implications. In the first case a manager loses lots of domain knowledge and in the second case a complex feature is implemented in a complex ecosystem. That’s asking for trouble.
For two years now, speech recognition has really taken flight. Neural networks, the abundance of (training) data and high-performance computing solutions have paved the way for engineering high accuracy speech recognition algorithms, thus making the investment more profitable. Evidently, as a key building block for many applications, the accuracy of speech recognition is of the utmost importance. The use cases are numerous, but sometimes create more or different data than a business or workflow can digest. Looking beyond the implementation of speech recognition and having a holistic view is something many businesses need to work on. Let’s take a step ahead…
If we automate a part of the workflow using speech recognition we might consider automating the human follow-up as well. This is where cognitive analytics come knocking; Artificial Intelligence that helps you unlock greater value from your speech recognition output. I want to address two examples of cognitive analytics I recently helped to implement which resulted in huge efficiency gains, cost-reduction and more importantly ‘gained insights’.
Topic detection: speech recognition is all about uncovering contextual value. But how do you connect the dots? The way we speak is very different from the way we think, write and eventually read and understand. Even if there’s a method to perfectly transcribe all your speech, it doesn’t necessarily mean that you’ve found the holy grail. Understanding the textual output from audio is time-consuming as it requires knowledge of the world of definitions and relations. It requires an ontology, something humans are good at, but machines aren’t. But we can train models that grasp some of that intelligence and teach it to machines. It begins by detecting topics, clustering information and defining relations. Using automatic topic detection on speech recognition output we have helped a political institution to create an automatic segmentation of political discussions within the Dutch parliament. Users of the software can easily skip large parts of irrelevant discussions and connect relevant segments of discussions to one another, gaining more insight.
Automatic Summarization: Taking this a step further, you can imagine a machine summarizing audio or video for you. Wouldn’t that be convenient? But here’s the tricky part, how to teach a machine what’s relevant content and what content to omit? Summarizing can be done by selecting relevant phrases and deleting irrelevant ones. A much more difficult method to summarize is to decompose a text and rewrite it using alternative sentences, words, and concepts. Luckily we have lots of data from human summarization processes to learn from and an ever-increasing vocab and ontology. For a Dutch radio station, we’ve built exactly such an algorithm. I must admit it doesn’t work flawlessly yet, especially whenever long ‘emotional’ discussions are being held. But the results for neutral audio and video are very promising. Besides, even with humans we have the experience that they summarize ‘emotional’ content in a very subjective way. In the following months, we will ask beta-testers to assess the results and compare machines to humans. Do you know more or do you want to know more about cognitive analytics? Please contact me at email@example.com.