JavaScript
Speech
Recognition
Who is this guy?
@macdonst
macdonst on Github
simonmacdonald.com
works at Adobe
Apache Cordova core contributor
nutty about speech recognition
The future won't be like Star Trek.
Scott Adams, creator of Dilbert
JavaScript Speech Recognition
Why do I care about speech rec?
JavaScript Speech Recognition
+
= Cape Bretoner
Here's a conversation between two Cape
Bretoners
P1: jeet?
P2: naw, jew?
P1: naw, t'rly t'eet bye.
And here's the translation
P1: jeet?
P1: Did you eat?
P2: naw, jew?
P2: No, did you?
P1: naw, t'rly t'eet bye.
P1: No, it's too early to eat buddy.
Regular Alphabet
26 letters
Cape Breton
Alphabet
12 letters!
Alright,
enough
about me
What is speech
recognition?
Speech recognition is the
process of translating the
spoken word into text.
The process of speech rec
includes...
Record and digitize the audio
data
Perform end pointing
(trimming)
Split data into phonemes
What is a phoneme?
It is a perceptually distinct
units of sound in a specified
language that distinguish one
word from another.
The English language has 44
distinct sounds
Source: English language phoneme chart
By comparison, the Rotokas
speakers in Papua New Guinea
have 11 phonemes.
But the !Xóõ speakers who
mostly live in Botswana have
112 phonemes.
Apply the phonemes to the
recognition model. This is a
massive lexicon which takes
into account all of the different
ways words can be
pronounced.
Analyze the results against the
grammar
Return a confidence weighted
result
[
{
"confidence":0.97335243225098,
"transcript":"hello"
},
{
"confidence":0.19940405040800,
"transcript":"helllow"
},
{
"confidence":0.19910827091000,
"transcript":"howlow"
}
]
Basically...
JavaScript Speech Recognition
We want it to be like this
0:02
but more often than not...
0:25
Why is that?
When two people talk
comprehension rates are better
than 97%
A really good english language
speech recognition system is
right 92% of the time
Where does that extra 5% in
error rate come from?
Vocabulary size and confusability
Speaker dependence vs independence
Isolated or continuous speech
Initiated vs spontaneous speech
Adverse conditions
Mobile Speech Recognition
OS  Application  SDK
Android Google Now Java API
iOS Siri Many 3rd party Obj-C SDK's
Windows Phone Cortana C# API
So how do we
add speech rec
to our app?
You may look at the W3C
Speech API Specification
but only Chrome on the
desktop has implemented that
spec
But that's okay!
The spec looks like this:
interfaceSpeechRecognition:EventTarget{
//recognitionparameters
attributeSpeechGrammarListgrammars;
attributeDOMStringlang;
attributebooleancontinuous;
attributebooleaninterimResults;
attributeunsignedlongmaxAlternatives;
attributeDOMStringserviceURI;
//methodstodrivethespeechinteraction
voidstart();
voidstop();
voidabort();
};
With additional event methods
to control behaviour:
attributeEventHandleronaudiostart;
attributeEventHandleronsoundstart;
attributeEventHandleronspeechstart;
attributeEventHandleronspeechend;
attributeEventHandleronsoundend;
attributeEventHandleronaudioend;
attributeEventHandleronresult;
attributeEventHandleronnomatch;
attributeEventHandleronerror;
attributeEventHandleronstart;
attributeEventHandleronend;
Let's recognize some speech
varrecognition=newSpeechRecognition();
recognition.onresult=function(event){
if(event.results.length>0){
vartest1=document.getElementById("test1");
test1.innerHTML=event.results[0][0].transcript;
}
};
recognition.start();
Click to Speak
Replace me...
So that's pretty
cool...
...if taking dictation gets you
going
But I want to do
something more
exciting with the
result
Let's do something a little less
trivial
recognition.onresult=function(event){
varresult=event.results[0][0].transcript;
varmusic=document.getElementById("music");
switch(result){
case"jazz":
music.src="jazz.mp3";
music.play();
break;
case"rock":
music.src="rock.mp3";
music.play();
break;
case"stop":
default:
music.pause();
}
};
Click to Speak
Which seems
much cooler to
me
Let's ask the web a question
Click to Speak
Works pretty
good...
...but ugly!
Let's style our
button with some
CSS
+
=
<aclass="speechinput">
<imgsrc="images/mic.png">
</a>
#speechinputinput{
cursor:pointer;
margin:auto;
margin:15px;
color:transparent;
background-color:transparent;
border:5px;
width:15px;
-webkit-transform:scale(3.0,3.0);
}
by Nicholas Gallagher
And we'll add some color using
Speech
Bubbles
Pure-CSS-Speech-Bubbles
Then pull it all
together!
JavaScript Speech Recognition
But wait, why am
I using my eyes
like a sucker?
We'll output the answer using
SpeechSynthesis
The SpeechSynthesis spec
looks like this:
interfaceSpeechSynthesis{
readonlyattributebooleanpending;
readonlyattributebooleanspeaking;
readonlyattributebooleanpaused;
voidspeak(SpeechSynthesisUtteranceutterance);
voidcancel();
voidpause();
voidresume();
SpeechSynthesisVoiceListgetVoices();
};
The SpeechSynthesisUtterance
spec looks like this:
interfaceSpeechSynthesisUtterance:EventTarget{
attributeDOMStringtext;
attributeDOMStringlang;
attributeDOMStringvoiceURI;
attributefloatvolume;
attributefloatrate;
attributefloatpitch;
};
With additional event methods
to control behaviour:
attributeEventHandleronstart;
attributeEventHandleronend;
attributeEventHandleronerror;
attributeEventHandleronpause;
attributeEventHandleronresume;
attributeEventHandleronmark;
attributeEventHandleronboundary;
JavaScript Speech Recognition
Plugin repo's
SpeechRecognitionPlugin -
SpeechSynthesisPlugin -
https://github.com/macdonst/SpeechRecognitionPlugin
https://github.com/macdonst/SpeechSynthesisPlugin
* Working with Julio César (@jcesarmobile) to get iOS done
Availability
OS  Recognition  Synthesis
Android ✓ ✓
iOS*  Soonish  Native to iOS 7.0+
Windows Phone  ×  ×
Getting started
phonegapcreatespeechcom.example.speechspeech
cdspeech
phonegapplatformaddandroid
phonegappluginaddhttps://github.com/macdonst/SpeechRecognitionPlugin
phonegappluginaddhttps://github.com/macdonst/SpeechSynthesisPlugin
phonegaprunandroid
For more information on hybrid
applications
Check out Nick Van
Weerdenburg and Andrey
Feldman presentation on
Creating a Comprehensive
Social Media App Using Ionic
and Phone Gap 3:45pm today
in 801A.
But wait, one
more thing...
Speech recognition and speech
synthesis are not well
supported in the emulator
and sometimes developing on
the device can be a bit of a
pain.
That's why I coded
speechshim.js
https://github.com/macdonst/SpeechShim
Chrome + speechshim.js
=
W3C Web Speech API on your
desktop
Types of Speech Recognition
Applications
Voice Web Search
Speech Command Interface
Continuous Recognition of Open Dialog
Domain Specific Grammars Filling Multiple Input Fields
Speech UI present when no visible UI need be present
Voice Activity Detection
Speech Translation
Multimodal Interaction
Speech Driving Directions
JavaScript Speech Recognition
THE END

More Related Content

PDF
Spring, CDI, Jakarta EE good parts
PPTX
Lambda The Extreme: Test-Driving a Functional Language
PDF
BDD in Javascript
PDF
PDF
Embedded application designed by ATS language
PDF
What is the best programming language for beginner?
PPTX
Mark asoi ppt
KEY
Dart: A Replacement for JavaScript and Why You Should Care
Spring, CDI, Jakarta EE good parts
Lambda The Extreme: Test-Driving a Functional Language
BDD in Javascript
Embedded application designed by ATS language
What is the best programming language for beginner?
Mark asoi ppt
Dart: A Replacement for JavaScript and Why You Should Care

What's hot (19)

PDF
Python overview
PPTX
Innoveo coding dojo
PPTX
BDD with F# at DDD9
PDF
Language portfolio
PDF
ATS language overview'
ODP
2009 Eclipse Con
PDF
ATS2 updates 2017
PDF
Grooming with Groovy
PDF
The Ring programming language version 1.2 book - Part 77 of 84
PDF
Exploring Natural Language Processing in Ruby
PPTX
C++ c#
PDF
PPTX
Computers for kids
KEY
PHP to Python with No Regrets
PDF
Static typing and proof in ATS language
PDF
Code kata
PPTX
Whats New In C Sharp 4 And Vb 10
PDF
The Ring programming language version 1.7 book - Part 89 of 196
PDF
Forget Ruby. Forget CoffeeScript. Do SOA
Python overview
Innoveo coding dojo
BDD with F# at DDD9
Language portfolio
ATS language overview'
2009 Eclipse Con
ATS2 updates 2017
Grooming with Groovy
The Ring programming language version 1.2 book - Part 77 of 84
Exploring Natural Language Processing in Ruby
C++ c#
Computers for kids
PHP to Python with No Regrets
Static typing and proof in ATS language
Code kata
Whats New In C Sharp 4 And Vb 10
The Ring programming language version 1.7 book - Part 89 of 196
Forget Ruby. Forget CoffeeScript. Do SOA
Ad

Viewers also liked (20)

PPTX
Speech to text conversion
PDF
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
PDF
Web Components & Shadow DOM
PPS
Leaderpalooza Feb2010
PDF
Introduction to Speech Interfaces for Web Applications
PPTX
Influence With Peers
PDF
Build the Virtual Reality Web with A-Frame
PDF
20160713 webvr
PDF
Introduction to WebGL and WebVR
PPTX
Refactoring vers les design patterns pyxis v2
PPTX
PPTX
Running .NET on Docker
PDF
Martin Naumann "Life of a pixel: Web rendering performance"
PDF
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
PPTX
Hardware for a_soft_world_bkup
PDF
Putting your Passion into the Details
PDF
Programming Play
PDF
Designing True Cross-Platform Apps
PDF
Reinvent Your Creative Process with Collaborative Hackathons
PDF
The Shifting Nature of FED Role
Speech to text conversion
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Web Components & Shadow DOM
Leaderpalooza Feb2010
Introduction to Speech Interfaces for Web Applications
Influence With Peers
Build the Virtual Reality Web with A-Frame
20160713 webvr
Introduction to WebGL and WebVR
Refactoring vers les design patterns pyxis v2
Running .NET on Docker
Martin Naumann "Life of a pixel: Web rendering performance"
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
Hardware for a_soft_world_bkup
Putting your Passion into the Details
Programming Play
Designing True Cross-Platform Apps
Reinvent Your Creative Process with Collaborative Hackathons
The Shifting Nature of FED Role
Ad

Similar to JavaScript Speech Recognition (20)

PDF
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
PPTX
From Programming to Modeling And Back Again
PDF
Natural language processing in iOS / OSX
PPTX
BDD in Xamarin with Specflow & Xamarin UI Test
PPTX
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
PPT
Domain Specific Languages
PDF
How To Be A Better Developer
PPT
Goodparts
PDF
Mobile Warsaw - Efficient Localization for iOS Apps
PPTX
Dear compiler please don't be my nanny v2
PDF
Paris Web - Javascript as a programming language
PPTX
Digital Technologies Presented by Jana Noto MATVE-FOOD.pptx
PDF
Voicecon - Mashups with Tropo.com
PDF
BDD Testing Using Godog - Bangalore Golang Meetup # 32
PPT
Douglas Crockford Presentation Goodparts
PDF
Quick Intro to Clean Coding
KEY
Notes (2012-06-08)
PPTX
02.PYTHON-STARTUP.pptx
PDF
Try the monad!
PPT
Lecture 11
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
From Programming to Modeling And Back Again
Natural language processing in iOS / OSX
BDD in Xamarin with Specflow & Xamarin UI Test
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Domain Specific Languages
How To Be A Better Developer
Goodparts
Mobile Warsaw - Efficient Localization for iOS Apps
Dear compiler please don't be my nanny v2
Paris Web - Javascript as a programming language
Digital Technologies Presented by Jana Noto MATVE-FOOD.pptx
Voicecon - Mashups with Tropo.com
BDD Testing Using Godog - Bangalore Golang Meetup # 32
Douglas Crockford Presentation Goodparts
Quick Intro to Clean Coding
Notes (2012-06-08)
02.PYTHON-STARTUP.pptx
Try the monad!
Lecture 11

More from FITC (20)

PPTX
Cut it up
PDF
Designing for Digital Health
PDF
Profiling JavaScript Performance
PPTX
Surviving Your Tech Stack
PDF
How to Pitch Your First AR Project
PDF
Start by Understanding the Problem, Not by Delivering the Answer
PDF
Cocaine to Carrots: The Art of Telling Someone Else’s Story
PDF
Everyday Innovation
PDF
HyperLight Websites
PDF
Everything is Terrifying
PDF
Post-Earth Visions: Designing for Space and the Future Human
PDF
The Rise of the Creative Social Influencer (and How to Become One)
PDF
East of the Rockies: Developing an AR Game
PDF
Creating a Proactive Healthcare System
PDF
World Transformation: The Secret Agenda of Product Design
PDF
The Power of Now
PDF
High Performance PWAs
PDF
Rise of the JAMstack
PDF
From Closed to Open: A Journey of Self Discovery
PDF
Projects Ain’t Nobody Got Time For
Cut it up
Designing for Digital Health
Profiling JavaScript Performance
Surviving Your Tech Stack
How to Pitch Your First AR Project
Start by Understanding the Problem, Not by Delivering the Answer
Cocaine to Carrots: The Art of Telling Someone Else’s Story
Everyday Innovation
HyperLight Websites
Everything is Terrifying
Post-Earth Visions: Designing for Space and the Future Human
The Rise of the Creative Social Influencer (and How to Become One)
East of the Rockies: Developing an AR Game
Creating a Proactive Healthcare System
World Transformation: The Secret Agenda of Product Design
The Power of Now
High Performance PWAs
Rise of the JAMstack
From Closed to Open: A Journey of Self Discovery
Projects Ain’t Nobody Got Time For

Recently uploaded (20)

PDF
Understand the Gitlab_presentation_task.pdf
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PPTX
AI_Cyberattack_Solutions AI AI AI AI .pptx
PDF
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
PPTX
Partner to Customer - Sales Presentation_V23.01.pptx
PPTX
MY PRESENTATION66666666666666666666.pptx
PDF
Course Overview and Agenda cloud security
PPTX
Artificial_Intelligence_Basics use in our daily life
PPTX
KSS ON CYBERSECURITY INCIDENT RESPONSE AND PLANNING MANAGEMENT.pptx
PPTX
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PPTX
Reading as a good Form of Recreation
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
PPTX
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
PPTX
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
PPSX
AI AppSec Threats and Defenses 20250822.ppsx
DOCX
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
PDF
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
Understand the Gitlab_presentation_task.pdf
Alethe Consulting Corporate Profile and Solution Aproach
AI_Cyberattack_Solutions AI AI AI AI .pptx
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
Exploring The Internet Of Things(IOT).ppt
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
Partner to Customer - Sales Presentation_V23.01.pptx
MY PRESENTATION66666666666666666666.pptx
Course Overview and Agenda cloud security
Artificial_Intelligence_Basics use in our daily life
KSS ON CYBERSECURITY INCIDENT RESPONSE AND PLANNING MANAGEMENT.pptx
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
Reading as a good Form of Recreation
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
AI AppSec Threats and Defenses 20250822.ppsx
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
The_Decisive_Battle_of_Yarmuk,battle of yarmuk

JavaScript Speech Recognition