Computers Have Eyes, Mouths, Ears and an AI brain

Virtusa xLabs

We have persevered with our on-off relationships with Siri, Asana and Alexa.

We are getting used to the fact that “interaction” no longer means keyboards and GUIs. We are also coming to terms with component disaggregation, the idea that we can interact with one intelligent systems through multiple devices. Sensory interfaces have pervaded devices. Should you really wish to, you can talk to your fridge and look at your empty ice box from work. Tech companies know that providing the user interface (UI) puts them at the front of the queue for managing your requests end-to-end and it’s the point at which you collect the greatest data/money, and disintermediate brands. This explains why 2017 saw Apple, Amazon and Google all launch new smart speakers and smart screens. It’s only a matter of time before Facebook and others launch similar devices.

In 2018 Smart voice-enable devices will become mainstream, with mature Natural Language Processing (NLP) able to execute complex tasks. For brands that have invested significant amounts in digital projects and animated avatars, this presents an issue. If Siri has a voice that’s undoubtedly Apple, Alexa is Amazon and Google Assistant is, well, Google, how do you brand a voice interaction? In the short term, most brands won’t. Applying a personality to your AI approach as well as a brand persona for your multilingual speech outputs is not a simple task.

Creating a custom voice, which reinforces brand values and engenders trust, is more nebulous and complex. Currently, Google Assistant lets you choose between two male voices with US accents for your persona. Chances are, none of these sounds will create the emotional connection that your highly paid brand ambassador does. In addition to capitalizing on audio inputs, many smart devices are also intelligently processing visual inputs. Image recognition can spot objects and gestures, and analyze sentiments and context. Mining Instagram and Facebook to promote items to customers based on their preferences is an obvious use case.

Visual inputs also enable biometric features, which are now being used by early adopter banks within the on-boarding processes. Monzo, the UK digital challenger bank, has no branches and asks customers to register by scanning an ID document such as a passport, or driving license and then taking a biometric selfie video, which is used verify your identity. Someone may wrongfully get possession of your passport, but replicating your facial movements would be impossible.


Example 1

Smile to Pay

 Alibaba’s affiliate, Ant, has developed a facial recognition payment system in China, called “Smile to Pay”, which doesn’t rely on smartphones. They applied this technology in KFC restaurants, where customers can pay by scanning their faces at a self-service kiosk. Biometric technology has been more widely adopted In China, and the facial recognition system for Alipay enables the cameras to execute payments or unlock parcel boxes. According to Ant, Smile to Pay utilizes a 3D camera and special detection algorithms to ensure account safety by blocking spoofing attempts using other people’s photos or video recordings.


Image source: Alizila


Example 2

Seeing AI

 Microsoft’s “Seeing AI” is an app that uses AI to recognize people, objects, and scenes. Users can take a picture of a person and the system will describe who they are and how they are feeling. The app doesn’t perform just basic image recognition but also tells the user how to position the camera to get the target in shot. For the visually impaired this app can narrate the world that surrounds them, reading documents, recognizing people and conveying people’s feelings. When using the app we found that its best to use headphones to keep descriptions private and avoids upsetting colleagues.

Microsoft’s seeing AI. Image source: Microsoft


Example 3

A Helpful Pair of Eyes

 The bot that rocks the cradle: Visual analysis has clear application in the health and well-being space. A good example of this is “Nanit” which monitors care of babies. This smart visual baby monitor that watches a child sleep and it use Machine Learning to learn the baby his behaviors, track sleep patterns and issues and then provide parents and medics with analysis. Alexa, does my bot look big in this? The Amazon Echo Look connects to Alexa, helping users to evaluate their clothing choices. Using voice technology users can easily trigger full-length photos and short videos that compile into a personal look book. The system then interacts with the person suggesting the best look and it allows the user to upload additional pictures and videos. Of course, Amazon can also happily sell the clothing to that consumer.


References: (2017). Alipay Launches ‘Smile to Pay’ for Commercial Use in China | [online] Available at: (2018). Amazon Echo Look. [online] Available at:

Cearley, D., Burke, B., Searle, S. and Walker, M. (2017). Top 10 Strategic Technology Trends for 2018. [pdf ] Gartner Inc.

Available at:

Fjord (2018). Fjord Trends 2018. [online] Available at:
Predictions 2018 a Year of Reckoning. (2018). [pdf ] Forrester Research Inc. Available at:

Guest, S. (2017). SAPVoice: Face Recognition Technology Set To Transform Retail.
[online] Forbes. Available at:

LDV Capital. (2018). LDV Capital Insights. [online] Available at:

Nanit. (2018). Nanit Baby Sleep Monitor. [online] Available at:

Reuters. (2017). Just smile: In KFC China store, diners have new way to pay. [online]

Available at:

The Future 100: 2018. (2017). [pdf ] The Innovation Group J. Walter Thompson Intelligence. Available at:

Wu, Leslie. (2017). Big Burger is watching you. [online] Available at: