JavaScript Speech Recognition

Who is this guy?
@macdonst
macdonst on Github
simonmacdonald.com
works at Adobe
Apache Cordova core contributor
nutty about speech recognition

The future won't be like Star Trek.
Scott Adams, creator of Dilbert

Why do I care about speech rec?

Here's a conversation between two Cape
Bretoners
P1: jeet?
P2: naw, jew?
P1: naw, t'rly t'eet bye.

And here's the translation
P1: jeet?
P1: Did you eat?
P2: naw, jew?
P2: No, did you?
P1: naw, t'rly t'eet bye.
P1: No, it's too early to eat buddy.

Regular Alphabet
26 letters
Cape Breton
Alphabet
12 letters!

Speech recognition is the
process of translating the
spoken word into text.

The process of speech rec
includes...

Record and digitize the audio
data

Perform end pointing
(trimming)

What is a phoneme?
It is a perceptually distinct
units of sound in a specified
language that distinguish one
word from another.

The English language has 44
distinct sounds
Source: English language phoneme chart

By comparison, the Rotokas
speakers in Papua New Guinea
have 11 phonemes.
But the !Xóõ speakers who
mostly live in Botswana have
112 phonemes.

Apply the phonemes to the
recognition model. This is a
massive lexicon which takes
into account all of the different
ways words can be
pronounced.

Analyze the results against the
grammar

Return a confidence weighted
result
[
{
"confidence":0.97335243225098,
"transcript":"hello"
},
{
"confidence":0.19940405040800,
"transcript":"helllow"
},
{
"confidence":0.19910827091000,
"transcript":"howlow"
}
]

We want it to be like this
0:02

but more often than not...
0:25

Why is that?
When two people talk
comprehension rates are better
than 97%

A really good english language
speech recognition system is
right 92% of the time

Where does that extra 5% in
error rate come from?
Vocabulary size and confusability
Speaker dependence vs independence
Isolated or continuous speech
Initiated vs spontaneous speech
Adverse conditions

Mobile Speech Recognition
OS Application SDK
Android Google Now Java API
iOS Siri Many 3rd party Obj-C SDK's
Windows Phone Cortana C# API

So how do we
add speech rec
to our app?

You may look at the W3C
Speech API Specification

but only Chrome on the
desktop has implemented that
spec

The spec looks like this:
interfaceSpeechRecognition:EventTarget{
//recognitionparameters
attributeSpeechGrammarListgrammars;
attributeDOMStringlang;
attributebooleancontinuous;
attributebooleaninterimResults;
attributeunsignedlongmaxAlternatives;
attributeDOMStringserviceURI;
//methodstodrivethespeechinteraction
voidstart();
voidstop();
voidabort();
};

With additional event methods
to control behaviour:
attributeEventHandleronaudiostart;
attributeEventHandleronsoundstart;
attributeEventHandleronspeechstart;
attributeEventHandleronspeechend;
attributeEventHandleronsoundend;
attributeEventHandleronaudioend;
attributeEventHandleronresult;
attributeEventHandleronnomatch;
attributeEventHandleronerror;
attributeEventHandleronstart;
attributeEventHandleronend;

Let's recognize some speech
varrecognition=newSpeechRecognition();
recognition.onresult=function(event){
if(event.results.length>0){
vartest1=document.getElementById("test1");
test1.innerHTML=event.results[0][0].transcript;
}
};
recognition.start();
Click to Speak
Replace me...

...if taking dictation gets you
going

But I want to do
something more
exciting with the
result

Let's do something a little less
trivial
recognition.onresult=function(event){
varresult=event.results[0][0].transcript;
varmusic=document.getElementById("music");
switch(result){
case"jazz":
music.src="jazz.mp3";
music.play();
break;
case"rock":
music.src="rock.mp3";
music.play();
break;
case"stop":
default:
music.pause();
}
};
Click to Speak

Let's ask the web a question
Click to Speak

Works pretty
good...
...but ugly!

Let's style our
button with some
CSS

+
=
<aclass="speechinput">
<imgsrc="images/mic.png">
</a>
#speechinputinput{
cursor:pointer;
margin:auto;
margin:15px;
color:transparent;
background-color:transparent;
border:5px;
width:15px;
-webkit-transform:scale(3.0,3.0);
}

by Nicholas Gallagher
And we'll add some color using
Speech
Bubbles
Pure-CSS-Speech-Bubbles

But wait, why am
I using my eyes
like a sucker?

We'll output the answer using
SpeechSynthesis

The SpeechSynthesis spec
looks like this:
interfaceSpeechSynthesis{
readonlyattributebooleanpending;
readonlyattributebooleanspeaking;
readonlyattributebooleanpaused;
voidspeak(SpeechSynthesisUtteranceutterance);
voidcancel();
voidpause();
voidresume();
SpeechSynthesisVoiceListgetVoices();
};

The SpeechSynthesisUtterance
spec looks like this:
interfaceSpeechSynthesisUtterance:EventTarget{
attributeDOMStringtext;
attributeDOMStringlang;
attributeDOMStringvoiceURI;
attributefloatvolume;
attributefloatrate;
attributefloatpitch;
};

With additional event methods
to control behaviour:
attributeEventHandleronstart;
attributeEventHandleronend;
attributeEventHandleronerror;
attributeEventHandleronpause;
attributeEventHandleronresume;
attributeEventHandleronmark;
attributeEventHandleronboundary;

Plugin repo's
SpeechRecognitionPlugin -
SpeechSynthesisPlugin -
https://github.com/macdonst/SpeechRecognitionPlugin
https://github.com/macdonst/SpeechSynthesisPlugin

* Working with Julio César (@jcesarmobile) to get iOS done
Availability
OS Recognition Synthesis
Android ✓ ✓
iOS* Soonish Native to iOS 7.0+
Windows Phone × ×

Getting started
phonegapcreatespeechcom.example.speechspeech
cdspeech
phonegapplatformaddandroid
phonegappluginaddhttps://github.com/macdonst/SpeechRecognitionPlugin
phonegappluginaddhttps://github.com/macdonst/SpeechSynthesisPlugin
phonegaprunandroid

For more information on hybrid
applications
Check out Nick Van
Weerdenburg and Andrey
Feldman presentation on
Creating a Comprehensive
Social Media App Using Ionic
and Phone Gap 3:45pm today
in 801A.

Speech recognition and speech
synthesis are not well
supported in the emulator
and sometimes developing on
the device can be a bit of a
pain.

That's why I coded
speechshim.js
https://github.com/macdonst/SpeechShim

Chrome + speechshim.js
=
W3C Web Speech API on your
desktop

Types of Speech Recognition
Applications
Voice Web Search
Speech Command Interface
Continuous Recognition of Open Dialog
Domain Specific Grammars Filling Multiple Input Fields
Speech UI present when no visible UI need be present
Voice Activity Detection
Speech Translation
Multimodal Interaction
Speech Driving Directions

JavaScript Speech Recognition

Recommended

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to JavaScript Speech Recognition (20)

More from FITC (20)

Recently uploaded (20)

JavaScript Speech Recognition