"Hallo Magenta", the voice assistant from Telekom, is the first European smart speaker. Making calls, operating MagentaTV and SmartHome devices with your voice, and all this with exceptional data security. Here Deutsche Telekom employee Dr. Andrea Schnall describes how she experienced the development work.
You are still quite new at Telekom. You studied electrical engineering at the TU Darmstadt and wrote your doctorate on machine learning in the audio field. What did you do there?
Dr. Andrea Schnall: Specifically, it was about examining accentuation in languages. I trained an algorithm to recognize which words are emphasized in an English and a Japanese data set. This can help to improve automatic speech recognition, because for example, people stress words differently if they have been misunderstood before and repeat themselves. For example, they then go up with the voice, become louder or stretch the word.
Then Deutsche Telekom 2017 took the next step for you with speech ...
Schnall: That's right, I was very pleased when the then still small Smart Speaker team brought me on board.
How did you come to work for Deutsche Telekom?
Schnall: When I was a student, machine learning became my hobby. There were no separate courses of study like there are today, and job advertisements were more in the area of data analysis. So I was all the more pleased when I saw a Telekom job ad looking for exactly that. I then found a team of nice colleagues with contagious enthusiasm. At that time they were still 35 members ... today they have over 200 members. We still had to find our role in the project and define what machine learning was actually needed to build a device that would execute relevant voice commands correctly. Everyone did everything that was appropriate. We started very early to work on the basic data set and define examples of intents and entities. This was also my first task.
This is hard to imagine for many people. How would you simply explain this?
Schnall: An intent is an intention, like "switching TV channels". Entities are the more detailed user information. Not just switch, but to a special program. The algorithm must recognize this. We are here with the topic of natural language understanding, as opposed to predefined voice commands. The speaker converts the speech into text, then triggers the desired action. We specify the intents, for example weather or TV program. When we train Artificial Intelligence, we use a data set with solutions and assign them to the intents. The model then learns these relationships. At first we brought in sample solutions ourselves, later we added recordings from beta testers. The more different, the better. Finally, we have to learn that behind "Can you predict the weather", "How will the weather be today", or "Do I need an umbrella" stands the same intent. The algorithms learn to generalize and to recognize similarities and dependencies in order to assign what is said to the intents. They also learn to assign as many probabilities as possible, i.e. where what is said might fit. But of course there are many more applications for machine learning in such a project.
What was it like for you when Deutsche Telekom launched the smart speaker 2019?
Schnall: Simply mega-tensioning. Finally the time had come. We were present at the IFA in Berlin. I think it's just great that a European company like Deutsche Telekom is standing up to the big players in East and West with its own device. I was thrilled from the very beginning. And it is only the beginning. Voice control will become part of our everyday lives, even more complex commands will be possible, everything will become more intuitive. In all industries. Take automotive, for example. Or medicine, where it's important to keep your hands free and work hygienically clean. For us at Deutsche Telekom, it should also be clear that things are just getting started. Of course, you always have to weigh things up: What do we buy, what do we do ourselves? But when it comes to machine learning, we can achieve a lot with just a few people. There is more to it than many people think. We should be even bolder with our German respectively European solutions.
What do you particularly appreciate about your work?
Schnall: I love discussing solutions to technical problems, and I have good discussion partners in the team. And we have a lot of fun, many good moments. At some point we built an easteregg into the speaker ...
I beg your pardon?
Schnall: "Easter eggs" are gags that developers hide in their software. We had that too, when the speaker was not yet launched. On a certain command it started to glow and play music.
Dr. Schnall, thank you for the interview.
Wie funktioniert Sprachsteuerung? – Netzgeschichten