What Voice Experience (VX) Best Practices Can Teach Us About Better Visual UX Practices
User interface (UI) and user experience (UX) historically have been visual design practices. Experts in this field are working towards a graphical user interface (GUI) designed to ease comprehension and navigation of visible content. As we alluded to in our introductory blog post about voice, vocal interface (VI) and vocal experience (VX) design practices are more like copywriting activities, where experts are working to define the language and elements of conversation that will most quickly bring a user the desired information or outcome. As we noted in our previous blog post:
“Given that the interface is made entirely of [spoken] language (as opposed to visual graphics), user experience for voice is entirely about thinking of all the ways a human may ask a question or state a need, and then how that person will understand and act upon an automated response. For interaction designers accustomed to working with visual interfaces, switching over to design voice interactions can be daunting because a very different set of rules dictate how information is shared. As visual designers, we have the luxury of all the nuance and contextual clues visual elements can provide. Most people know what a menu looks like, what a button does, where they ‘are’ in terms of location whether physical or virtual, etc. We take these things for granted in an experience that has visual guides, but that all disappears when there is only vocal interaction.”
We wanted to explore this difference a little further by examining best practices for voice experiences and thinking about how those may impact, shape, or change the way we think about visual UX. While voice experience is still a relatively new field, there are some established best practices taking shape within the voice community. Those best practices are being revised and improved at a rapid pace, but a solid foundation is taking shape. Here are a few that relate very directly to visual UX:
Avoid ambiguity. Be specific.
As obvious as this sounds, and as much as everyone says they want to avoid ambiguity, there remain endless examples of menus, buttons, and paragraph headers across the wide landscape of digital experiences that confuse users with ambiguous language. You know you’ve seen them. Chances are you may even be responsible for a few.
We can get away with this usability sin because graphic interfaces provide a lot of context to help users understand what otherwise seems ambiguous. In a menu, we can visually comprehend all of the options at the same time, and utilize deductive reasoning to understand what content maybe be categorized under a vague term or phrase. Sighted people may take that power for granted. If you’ve ever used a screen reader or an automated phone system, you know the memory game you have to play to remember all the options. If those options are unclear, non-specific, or imprecise, you are left stuck with indecision or you have to guess. Trial and error is not a fun process for users.
For voice interfaces, it is imperative to be as specific and precise as possible in the language used to describe options. Understanding that users will need to keep mental track of their options, they need to be able to clearly and quickly differentiate what those options are before they forget.
What would be the impact of placing such an emphasis on clarity in our web design and development work?
The corollary here to visual interfaces is pretty clear, but having the willpower to actually implement it in practice is much less certain. The inherent conflict of this usability heuristic pits “creative” options against “practical” options. For example, the menu title “Blog” is very clear and practical, but in the rush to differentiate a brand, you’ll find lots of menu titles for “Ideas” and other euphemistic titles. Is this really helpful to the user, or is this just branding? In a visual context, we can get away with this type of ambiguous language, but in a voice interface, you probably just want to say, “Blog.”
Avoid cognitive overload. Be precise.
Along the same lines of precision, the options presented in a voice interface should be very limited, even at the risk of not presenting all of the options a user may have. Amazon’s recommendations for Alexa Skills is to provide the top 3 options and then another to hear more options.
Additionally, automated voice responses should not be overly verbose because users must hold all of the options and information in their minds, and then make a choice. The fewer the options, conveyed in the most minimal and precise language possible will lead to the best experience and the greatest likelihood of success.
What would be the impact of placing such an emphasis on brevity and precision in our web design and development work?
Again, the correlation is pretty clear: our websites should be more succinct and precise if we want users to be more effective, successful, and happier using them. This applies as much to design as it does to information architecture and copywriting. As little information and as few words as possible to convey the meaning or idea is going to lead to more efficient and effective user interactions with our content.
And yet, real world examples show us that we are not very committed to following these best practices when it comes to designing and writing for the web. Content is often overly verbose, as are navigational tools like page titles, headers, and menu options. Spending more time and effort testing our sites with real users would probably help us refine our language and navigational choices. Extensive user testing is also a best practice in voice applications as well.
Handle errors gracefully. Reroute users on the wrong path.
Voice experiences don’t have a “back” button. The voice interaction equivalent of a 404 page (i.e. the error message a user receives when a page is not found on a website) is for the app to:
- Apologize for not understanding the request or not being able to find the answer to the request
- Guess what the user said or meant, and ask if that guess is correct
- Offer some options for what the app has guessed was the users intent
- Offer some alternative options the app guesses are related to the users intent
- Ask the user to repeat the request (this is the worst case, and should generally be avoided)
Since the user in a voice interaction doesn’t have the visual user interface that website users have, the app has to be far more accommodating in how it handles errors. Making intelligent guesses at the user’s intent, offering options based on best guesses or seemingly related information, clarifying if the app has understood the request properly — these are far more helpful to voice users than what we tend to offer website users because they have to be. The voice interaction would be far too frustrating if we didn’t have such graceful handling of errors and dead ends.
This begs an interesting question with respect to websites:
Are we really doing enough to handle errors and dead ends gracefully?
404 pages have been the tool for handling errors and dead ends on websites for decades. It’s a convenient way for a website to handle page requests for pages that don’t exist. We designed this practice for the ease of managing a website, but much has changed in the past decade since established this norm. Most importantly, there has been a huge shift in our focus when it comes to designing and building websites. Rightfully so, our attention has shifted from a priority on what made management of the website most simple to what made usability and the user experience of the website the simplest and most pleasing for site visitors.
It seems foolish that this was not always the case, but 404 pages are a good example of the kinds of approaches we took in building websites before human-oriented design and user experience were paramount to effective website design. The fact that they still exists seems like a glaring oversight, one the entrance of voice experiences makes all the more apparent.
We should be asking ourselves how 404 pages can be more useful. When we ask this question, some immediate answers jump right out. Perhaps websites could be smarter in using session data to know what pages a user has already visited to better predict what they may be seeking. If in the course of that session a user hits a 404 page, perhaps that 404 message could include links for all of the pages that user has visited during the session, as well as pages that are related topically to the pages already visited. That may not solve the problem every time, but it is more helpful than simply telling the user the page they are looking for isn’t there without providing any alternative directions or options.
The same can be said of searches within a site. Returning 0 search results (even if there are no results that fit the search criteria) is not very helpful and doesn’t help the user move forward. Providing any kind of related results, even if those aren’t what the user was seeking, has the potential to uncover a helpful direction for the user.
Further, errors have a tendency to follow patterns. If you’ve ever done any usability research with real users, you know that problems they encounter very frequently follow the same patterns (thankfully, since that is how we identify the biggest usability problems!). If sites were better tooled to track common errors along with their circumstances (i.e. pages visited prior to the error, pages visited after the error, in-bound links to error states, search results leading to error states, etc.), they could be better able to “learn” how to make smarter suggestions when displaying error pages (like 404 pages).
The technology to better handle errors has emerged, and our websites need to modernize in order to meet the usability standards we expect, and that users of voice interactions will rightly begin to demand.
This is just the beginning.
The best practices we’ve considered here are really just the tip of the iceberg for both voice experiences and website experiences. There is so much more to optimizing the user experience for both platforms. That being said, these examples illustrate how voice apps and smart speakers are pushing the boundaries of what we think of as “good” UX. Just as accessibility standards are currently pushing web design and development to be better and more inclusive in their practices and products, voice interactions will likely produce the next wave of standards that raise the bar for all of the computer-based interfaces we meet.