Dan D Kim

Let's share stories

Using Microsoft's Speech to Text REST API? Don't use it on Chrome

2021-01-25 Dan D. Kimhackathon

I have been mentoring at a couple hackathons recently.

Super fun. Wrote about the experience here.

There was a very gritty problem that I helped a team with. It The solution wasn’t very obvious, but I feel like it should be.

I’m writing about it here in hopes to make it more obvious.

Context

A team was using Microsoft’s Speech to text REST API.

Without getting too deep into the details, the team wanted to record audio using the browser’s MediaRecorder, and send the audio file directly to the API. The following illustration depicts the plan.

Plan

As shown in the Microsoft documentation, the API supports either audio/wav; codecs=pcm or audio/ogg; codecs=opus.

The Problem

This team tried both, but none of it worked.

const mimeType = { 'type' : 'audio/ogg; codecs=opus' }
const mimeType = { 'type' : 'audio/wav; codecs=pcm' }

The API kept returning 400 Invalid Request in both cases.

The Reason

Turns out, Chrome’s MediaRecorder doesn’t support either codecs.

To check if a codec&format is supported, run the following in your DevTools console.

MediaRecorder.isTypeSupported('audio/ogg; codecs=opus')

You can check other codec(s) too.

MediaRecorder.isTypeSupported('audio/wav; codecs=pcm')

The Solution

Use a browser that supports the format & codec.

Turns out Firefox can support audio/ogg; codec=opus).

MediaRecorder.isTypeSupported('audio/ogg; codecs=opus'); // false on chrome, true on firefox

Go download it if you need a workaround. https://www.mozilla.org/en-CA/firefox/new/

References

The golden blog post that saved the night. Thank you Nick!