Computers have eyes, ears, mouths, and an AI brain

Published: July 2, 2018

We have persevered with our on-and-off relationships with Siri, Asana, and Alexa.

We are getting used to the fact that "interaction" no longer means keyboards and GUIs. We are also coming to terms with component disaggregation, or the idea that we can interact with one intelligent system through multiple devices. Sensory interfaces have pervaded devices. Should you really want to, you can talk to your fridge and look at your empty icebox from work. Tech companies know that providing the user interface (UI) puts them at the front of the queue for managing your requests end to end and it's the point at which you collect the greatest amount of data and money and distinguish your brand. This explains why 2017 saw Apple, Amazon, and Google all launch new smart speakers and smart screens. It's only a matter of time before Facebook and others launch similar devices.

In 2018, smart voice-enabled devices will become mainstream with mature natural language processing (NLP) able to execute complex tasks. For brands that have invested significant amounts in digital projects and animated avatars, this presents an issue. If Siri has a voice that's undoubtedly Apple, Alexa is Amazon, and Google Assistant is, well, Google, how do you brand a voice interaction? In the short term, most brands won't. Applying a personality to your AI approach as well as a brand persona for your multilingual speech outputs is not a simple task.

Creating a custom voice, which reinforces brand values and engenders trust, is more nebulous and complex. Currently, Google Assistant lets you choose between two male voices with US accents for your persona. Chances are, none of these sounds will create the emotional connection that your highly paid brand ambassador does. In addition to capitalizing on audio inputs, many smart devices also intelligently process visual inputs. Image recognition can spot objects and gestures and analyze sentiments and context. Mining Instagram and Facebook to promote items to customers based on their preferences is an obvious use case.

Visual inputs also enable biometric features, which are now being used by early adopter banks within onboarding processes. Monzo, the UK digital challenger bank, has no branches and asks customers to register by scanning an ID document such as a passport or driver's license and then taking a biometric selfie video, which is used to verify your identity. Someone may wrongfully get possession of your passport, but replicating your facial movements would be impossible.

Example 1

Smile to Pay

Alibaba's affiliate, Ant, has developed a facial recognition payment system in China called Smile to Pay that doesn't rely on smartphones. They applied this technology in KFC restaurants, where customers can pay by scanning their faces at a self-service kiosk. Biometric technology has been more widely adopted in China, and the facial recognition system for Alipay enables the cameras to execute payments or unlock parcel boxes. According to Ant, Smile to Pay utilizes a 3D camera and special detection algorithms to ensure account safety by blocking spoofing attempts using other people's photos or video recordings.



Image source: Alizila

Example 2

Seeing AI

Microsoft's Seeing AI is an app that uses AI to recognize people, objects, and scenes. Users can take a picture of a person and the system will describe who they are and how they are feeling. The app doesn't just perform basic image recognition but also tells the user how to position the camera to get the target in shot. For the visually impaired, this app can narrate the world that surrounds them, reading documents, recognizing people, and conveying people's feelings. When using the app, we found that it's best to use headphones to keep descriptions private and avoid upsetting colleagues.

Microsoft's seeing AI. Image source: Microsoft

Example 3

A Helpful Pair of Eyes

The bot that rocks the cradle: Visual analysis has clear application in the health and well-being space. A good example of this is Nanit, which monitors the care of babies. This smart visual baby monitor watches a child sleep and uses machine learning to learn the baby's behaviors, track sleep patterns and issues, and then provide parents and medics with analysis.

Alexa, does my bot look big in this? The Amazon Echo Look connects to Alexa, helping users to evaluate their clothing choices. Using voice technology, users can easily trigger full-length photos and short videos that compile into a personal look book. The system then interacts with the person, suggesting the best look, and it allows the user to upload additional pictures and videos. Of course, Amazon can also happily sell the clothing to that consumer.

References: (2017). Alipay Launches ‚Smile to Pay for Commercial Use in China | [online] Available at: (2018). Amazon Echo Look. [online] Available at:

Cearley, D., Burke, B., Searle, S. and Walker, M. (2017). Top 10 Strategic Technology Trends for 2018. [pdf ] Gartner Inc.

Available at:

Fjord (2018). Fjord Trends 2018. [online] Available at:
Predictions 2018 a Year of Reckoning. (2018). [pdf ] Forrester Research Inc. Available at:

Guest, S. (2017). SAPVoice: Face Recognition Technology Set To Transform Retail.
[online] Forbes. Available at:

LDV Capital. (2018). LDV Capital Insights. [online] Available at:

Nanit. (2018). Nanit Baby Sleep Monitor. [online] Available at:

Reuters. (2017). Just smile: In KFC China store, diners have new way to pay. [online]

Available at:

The Future 100: 2018. (2017). [pdf ] The Innovation Group J. Walter Thompson Intelligence. Available at:

Wu, Leslie. (2017). Big Burger is watching you. [online] Available at:

Transformative digital technology solutions

Dramatically increase the success of your digital transformation

Related content