Bias is a problem for AI interoperability. Every voice AI interprets human language slightly different. Learn why we use a grammar based on who else? questions as protocol for our AI namespace protocol.
Currently, there are 1000+ voice AI provider world-wide. But different AI ecosystems usually can't be easily connected.
AI research is happening across languages and use cases at different speeds. Small languages generally have a disadvantage (e.g. Hebrew).
Today's voice AI are only narrow intelligent and understand only use cases they are trained for. Human-like intelligence is still a dream.
Our idea is to make voice commands comparable by providing a simplified language to store the intent of a language-based request. The inspiration to this idea comes from Noam Chomsky's theory of language that is hard-wired in human understanding.
Universal grammar is a theory that certain parts of language appear to be hard-wired in human thinking. Since its introduction in the 1960s it is considered a cornerstone theory of modern computer science.
Max Planck researcher demonstrated in 2013 that 'Huh' carries characteristics of universal grammar. They later won the IG "alternative" nobel prize.
We demonstrate that 'who else?' grammar appears to exhibit similar properties. Maybe this is a new proof of universal grammar. It's definitely language every knows how to use.
Since it is language apparently available to every kind of user, and applicable to every kind of question, we use "who else?" questions as a design for the intent catalog. Initially we provide it for the 200 most common request types in human language. Voice AI providers can use this catalog as a shared protocol to store the content of voice commands.
Voice assistants can lookup other voice assistants and forward the user request using a standardized language as protocol.
With our work we support the development of public available and vendor-agnostic NLP API standards. AI integration become this way much easier and cheaper.
The vision is a network of connected voice AI technologies. If successful, one day maybe every AI is able to use "who else?" to lookup other voice AI.
Executive Summary
Average people recall however only 3.7 brands during 80% of their time. Users prefer actually to speak with voice AIs like they would speak to other humans. They do not want to recall a different name for every other feature. This is why our long-term idea is to establish "who else" questions as universal wake words that all voice AI understand and every user knows. It much nicer to remember a single phrase instead of many brands: "Apartment who else?", "Delivery who else?", "Date who else?". It's however not important for our business model. Voice AI users can speak however they want. whoelse.ai initially only provides a protocol to store voice commands in a simplified format.
Vision
This project proposes the installation of a publicly available address system for voice AI. Similar to how the Domain Name System (DNS) and the Hypertext Transfer Protocol (HTTP) were needed for the original “Text Internet” to succeed, we suggest the development of a shared namespace system for “Voice Internet” Natural Language Processing (NLP) AI technologies(e.g. smart speaker, voice assistants, chatbots).
AI business models (e.g. Amazon Alexa, Google Search, Apple Siri) depend on access to consumer data (e.g. preferences, search queries, voice commands). It is in particular difficult for OEMs (e.g. car companies, home appliances, FMCG electronics) to adapt. As experts in the combination of technology suppliers, European OEMs do not have enough own AI know-how and data access to reach parity with US and Chinese companies.
Natural language processing (NLP) and voice assistant technologies (smart speaker, chatbots, voice assistants) are expected to account for 2/3 of all Internet search requests by the year 2025. Every first customer contact will be a bot. Voice-based e-commerce will become a $55bn p.a. market during the same period (Gartner).
The potential of voice interfaces stems from usability and availability of the medium: Everybody speaks. Language is a tool that users of all ages already know. But integrating voice technologies is a huge challenge for European OEMs such as car manufacturers, FMCG electronic companies, and telecommunication service providers.
They have 2 options: Either they use Alexa and become the microphone of the Amazon business model. Or, they choose a white-label NLP (e.g. Nuance, Cerence, Watson) and build custom voice assistants.
NLP is a young domain and currently more than 1000 voice AI companies, research projects, and industry solutions compete world-wide across different languages and use cases.
Examples for voice AI use cases & applications:
- Smart speaker
- Voice assistants
- Biometric user identification
- Customer hotlines
- Chatbots
- Voice-based COVID 19 diagnostics
- Industry-specific (e.g. banking, insurances) voice assistants
- Digital receptionists
- Toys
- Text classification (e.g. legal contracts, medical files)
- Text generation (e.g. marketing, advertising)
- Search engines..
It is a situation of AI bias. Voice AIs are usually only good for the domain they are designed and trained for. NLP R&D breakthroughs happen every week. It’s difficult to predict which voice AI will be best suited for a product to be released in 2-4 years.
Furthermore is the localization of voice interfaces a problem. To sell a German car with voice assistant features, or a home appliance product with an integrated smart speaker, to customers in e.g. China, USA, and France, OEMs must integrate multiple NLP technologies, because each market has a different technology leader.
Voice AI development is taking place at different speeds across different languages. The market for Swedish NLPs is for instance much smaller than for technologies with Mandarin capabilities. The Government of Israel recently announced that it will sponsor the development of voice AI capabilities for Hebrew language at Amazon and Google.
Entering whoelse.ai - the first universal language for all AIs. To make the combination of different NLP technologies easier, we provide voice AIs a simplified language to store and exchange voice-based user requests (intents) in a standardized format.
This way voice interfaces can contain multiple voice AI technologies. User requests can be answered by the voice assistant most suited to respond:
Example intent catalog implementation:
Smart Speaker for Co-Working Spaces
├── NLP 1: IBM Watson (WeWork AI)
│ ├── Air Condition
│ ├── Room Booking
│ ├── Catering
│ └── Register Guest
│
└── whoelse.ai
└── NLP 2: Cisco Mindmeld (PWC AI)
│ ├── Tax Fillings
│ ├── HR Management
│ └── Digital Lawyer
└── NLP 3: Nuance Mix (Lufthansa AI)
│ ├── Ticket Booking
│ ├── Hotel Reservation
│ └── Rental Cars
└── NLP 4: Deepgram (no white label)
│ ├── Transcription Service
│ ├── Task Automation
│ └── Meeting Translator
└── NLP 5: Alexa (Amazon)
├── ..
├── ..
└── ..
User journey:
Voice AI 1 Welcome at WeWork - how can I help you?
User Input I want to file my taxes!
Voice AI 1 I can not help you personally. But I find the best AI available!
Searches in whoelse.ai intent catalog
Voice AI 2 Welcome at PCW. Please tell me first your tax code (..)
We detailed this concept during the organisation of a DIN industry standard initiative for NLP API interoperability. As consortium initiators we worked together with industry partners and validated the demand of such an AI exchange.
The Domain Name System was once needed for the Text Internet to succeed. This project proposes to now develop the technologies needed for the first address system for a new kind of Voice Internet.
But standardization itself is not a business model. In the current environment usually de-facto monopolists like “GAFA” control by their market dominance the adoption of technology specifications and SDKs in the industry.
Same is now happening in the field of voice AI interoperability. In September 2019 Amazon announced a new NLP standard initiative. The Alexa consortium decided that wake words (Alexa, Einstein, .. ) will control which voice assistant responds to a user request.
This selection logic will, in our opinion, not work. Because it is unlikely that different OEMs will be able to find a shared agreement about the ownership of arbitrary language. Example: “Voice AI, find me a ride-share” - who should decide if this command is directed to e.g. BMW or VW? Will it be the user, the AI, or the interface provider?
The Amazon’s consortium naming logic favours the most known brands and thus is designed to position Alexa in the best way possible. Research shows as well that consumers do not want to remember multiple brands for voice assistants and prefer to use natural language over synthetic input dialogs. Naming voice AI will be an ongoing topic of concern in the industry.