At Voxware, we’ve deployed voice solutions to hundreds of warehouses. Over the years, we’ve noticed two keys to workforce acceptance of voice. The first is when management proactively involves the workforce in the introduction of the new system, as opposed to simply imposing a new way of doing things.
The second key to success involves the voice recognition software itself. For workers to succeed with voice picking, the system must be able to recognize them every time they speak, all day long, and no matter what the background noise situation might be. This sounds simple, but in practice there are few voice recognizers that can perform at this level. Here are some key “must-haves” to ensure your workers will love the voice system:
Recognize Everyone. Today’s warehouse workforce is a diverse group, so a voice recognizer must be “trained” to each user’s way of speaking. This is called a “speaker dependent” recognizer, and it is the only way to get consistently accurate recognition of people with differing ethnic backgrounds, languages, and accents. Speaker-independent solutions (in which no training is needed) create worker frustration due to frequent misrecognitions. When workers are forced to repeat themselves, the very basis for voice picking’s ROI becomes questionable.
Recognize Everywhere. A warehouse is a challenging environment for voice recognition due to the many forms of either constant or intermittent background noise. One cannot rely strictly on “noise cancelling” microphones – the voice recognition software must have the capacity to filter out background noise by correctly distinguishing it from the worker’s voice.
Recognize Everything. A key indicator of recognizer strength is whether a worker can use it for every task, including logging-on, selecting the application, and changing headset volume. If, for instance, a solution doesn’t support voice logon, it means the customer cannot operate the system on some of today’s popular “voice only” units – and it is a sign that the voice recognition software is weak.
Recognize The Context. Workers need to speak with supervisors and colleagues. This can confuse weak recognizers, which assume that the worker is speaking to the voice system but is saying something wrong. So workers have to resort to pulling microphones away from their faces (risking damage to the headset) in order to speak to a real person. The best voice recognizers have “out of vocabulary rejection” – meaning that they can detect when the worker is speaking to the voice application, and automatically ignore side conversations.
Recognize Different Skill Levels. Too often, voice solutions fail to give expert workers the ability to maximize their performance because they are “dumbed down” to accommodate less experienced workers. But if super workers can combine multiple phrases into a single utterance, they can achieve even higher productivity. This is only possible with a “continuous recognizer” that accurately handles longer phrases. It enables an organization to voice-enable workers according to their own level of expertise, and gradually move them up to a higher standard. “Discrete recognizers” can only support very short words like “ready,” “yes,” and “no,” which requires more interactions, frustrating experienced workers who already know what to say next.
Sound Human. Workers have to use the voice system for 8 hours a day. The best solutions offer recorded human prompts so that they hear a real voice. Many also offer text-to-speech voices (TTS), which admittedly have gotten better in recent years but often can still sound like a computer. By supporting both human and TTS voices companies have a choice.