fabernovel loader

Apr 26, 2018 | 5 min read


Voice assistants: decoding a new trend

Julien Breitfeld

Data Architecture Director

Voice assistants have proved to be a big success in recent years, and their adoption by more and more people reflects a widespread interest in the new services they offer. It’s only natural to stop and ask ourselves: what’s behind this latest trend?

At the start of the year, highly-respected tech magazine Wired ran the controversial headline:

“Facebook’s Virtual Assistant M is dead. So are chatbots”

Just two years after the much-trumpeted launch of its chatbot platform, Facebook closed its “M” program, implicitly acknowledging the failure of its attempt to move into the “post-app” era.

For the past two years, the spotlight has been firmly focused on another kind of human-computer interaction (HCI): voice recognition and expression, driven by intensive efforts from the other four “GAFAM” companies, armed with their APIs and development teams dedicated to machine learning.

And yet NLP (or natural language processing) first hit the mainstream six years ago when Apple included Siri in the iPhone 4S. Old hat from Apple, given that it was a version of the knowledge navigator imagined by John Sculley in 1987.

However, it wasn’t until Amazon launched its Echo product in 2015 that voice interaction finally reached a broad audience. What’s behind the explosion of one, and the (relative) death of the other?


Context, context, context

Beyond fashion and analysts’ thirst for self-fulfilling prophecies, Amazon’s genius was to create usage through the design: Amazon Echo, and the pseudo-AI that supports it, Alexa, has been developed with the main aim of turning consumers’ homes into stores, complete with sales assistants. Armed with its Prime consumer base, the Seattle giant has really pushed its product based on two pillars of current experiential standards: the demand for immediacy and assumed laziness. Amazon has also decided to go against the grain on mobility: the object is fixed and has a clearly-defined territory, which it controls at the user’s command. With Echo, Amazon has reinvented the nineteenth-century concept of domestic servants, letting you shop, control connected devices and access simple services using your voice, all at the same time.

Despite these interactions, Alexa is no match for Siri or Cortana. And some text chatbots are far more sophisticated. But such comparisons fail to account for context, which is an integral part of the design.

Firstly, a voice assistance goes unnoticed. The first-generation Amazon Echo had no fewer than seven microphones to pick up voice commands. Also, the service is outshone by the usage: there’s no need to get your iPhone out of your pocket, unlock it and speak. You just have to be in the same room as the device, and to speak: Amazon Echo is always listening, and only springs into action when you call its name. There’s no keyboard here- HCI has ousted text chatbots because interactions do not need any accessories, such as a keyboard, mouse or screen. Humans no longer need such tools.

Furthermore, a voice assistant is first and foremost a speaker. By reintroducing a fixed object that can broadcast sound into a living space, Amazon has reinvented the age-old medium of radio, particularly for the younger generation. Indeed, according to Pew Research Center, 90% of young adults in the USA have a smartphone, and more than 70% of them listen to a music streaming service, but ownership of physical radios is in free fall. According to the latest study by Edison Research, The Infinite Dial, the global rate of radio ownership has fallen by 25 percent in 10 years, with half of those aged 25-34 saying that they do not have one. By positioning its device in the internet radio market, Amazon has brought DAB (digital audio broadcasting) to life, free from the limitations of a moribund standard. That’s why Radiofrance reported that 400,000 listeners tuned into its stations using Google Home in January 2018.

Lastly, a voice assistant is personal. This is where context is most significant, and suggests that text chatbots aren’t completely dead: you don’t ask your assistant questions in front of an audience, no more than you would talk to your telephone in the street, for fear of looking like a freak. But voice fosters intimacy with your assistant: it’s the ideal vehicle for our interactions.


APIs, artificial intelligence and platforms

The development of machine learning processes has drastically increased computers’ speech recognition rate, as well as their recognition of the meaning of phrases. Google was a pioneer in this field, using the questions users asked its search engine to build a database of ontologies. Semantic analysis has also moved on from the B-A-BA method (recognizing one word after another, then assembling them into phrases). Today, the AI underpinning voice recognition uses algorithms that recognize the meaning of phrases in real time. One example is syntaxNet, a Google project that’s now open source.

And because we live in a hyper connected world, the intelligence of these voice assistants is to be found in server farms, without interactions occurring between a sensor/microphone and APIs that can be called from any object. Talk to the machine, and you shall be understood.

The strategy adopted by the GAFAM tech firms is to impose their SDKs on any connected object, from fridges to cars; their motivations vary according to their business models, but they all dream of one thing: for their platform to be home to YOUR assistant, an assistant present in all areas of your life, delivering their own services or services from third party developers, and capable of learning from each and every one of your interactions.


“I’m feeling lucky”

According to the dictionary, intelligence is the art of making choices. Search engines, which show SOME results, are over. Voice assistants are about simplifying human-computer interactions: what we want now is THE result. Google’s “I’m feeling lucky” button gives users ONE SINGLE answer, in context, depending on who’s asking the question, but also based on environmental parameters including the time of day, geographical location and previous interactions. The voice assistant brings that “lucky” answer to life. What’s more, voice assistants are on the up because “luck” is now a matter of precision, because the interaction is natural, and because they know you. For younger generations they’re (imaginary?) friends, for older ones they’re something straight out of science fiction, and for those in between they’re a tool that offers either terrifying or exciting possibilities.

Voice assistants are more than just a trend, but nor will they replace other interfaces; instead we’ll gradually see them integrate into an overall HCI that comprises sounds, text, gestures and feedback loops that are increasingly transparent, because they’re human. However, it may well be that voice will be at the center of the new world currently being created, because we live in a world of sound. Waiting for that “little” voice 😉




Are you interested in these subjects?

Contact us
logo business unit


Together, we craft the future of your industry at startup speed.

next read