Meaning Error Rate: ASR Domain-Specific Metric Framework

Speech recognition became a popular task during the last decade. Automatic speech recognition (ASR) systems are used in many fields: virtual assistants, call-center automation, device speech interfaces, etc. Each application defines its own measure of quality. Improvement in one domain could lead to loss of the recognition quality in the other domain. For ASR services open to the public, it is essential to provide reasonable quality for all customers in their scenarios. State-of-the-art metrics currently do not fit well for this purpose as they do not adapt to domain specifics. In our work, we build a speech recognition quality evaluation framework that unifies feedback coming from different types of customers into a single metric. For this purpose, we collect feedback from customers, train a new dedicated metric for each customer based on their feedback, and finally aggregate these metrics in a single criterion of quality. The resulting metrics have two significant properties: they compare recognition quality in different domains, and their results are easy to interpret.