SlideShare a Scribd company logo
Pleaseturn off your webcam
If you arejoining from a mobile phone
besureto click on
Join via Device Audio
Weare waiting for other participants to join
Wewill begin at 4:30 PM IST
Mihir Thakkar
Founderand Instructor
hello@codeheroku.com
Reinforcement Learning with
OpenAIGym
SESSION OBJECTIVES
• Quick Recap
• Bellman’sEquations
• Value Iterationin OpenAI Gym
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
RL Problem
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Q Function
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(A3,East)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(B3,North)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Given a Value Function
Extract Policy
Value Iteration
OpenAI Gym
https://drive.google.com/file/d/16xMyG7bKrtT_6SId1kqLpR2vL1Km_us8/view?usp=sharing
https://github.com/codeheroku/Introduction-to-Machine-Learning/tree/master/Reinforcement%20Learning/RL2%20Value%20Iteration
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Value Iteration Algorithm
Reinforcement Learning
Challenges
• Access to the Environment
• Delayed Reward (Temporal Credit RiskAssignment)
• High Cost Actions
• Distribution of data changes by the choice of actions you
take
• Efficient state representations?
• Good Rewards functions?
Thanks
Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Markov Decision Process (MDP)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Multi Arm Bandit
• Unknown Reward Distribution
• Deterministic Actions
• Objective:FindSequence of actions
whichwillmaximizetotal reward
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Exploration Vs Exploitation
To approximatevaluesof actionsAgent must choose actionsthatare non-
optimalto start with.
Once an agent has approximatedthe values, it can greedily pick the
highest value action.
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Iterative Averaging

More Related Content

DOCX
Nick Dean CV 25th October 2015
PDF
Active search pro brochure
ODP
PHP Training in Chandigarh
DOC
Sap bpc
DOCX
Aakash Ambhore
PDF
Laptop repairing course delhi
PDF
Machine Learning , deep learning module imp
PDF
Introduction to Reinforcement Learning
Nick Dean CV 25th October 2015
Active search pro brochure
PHP Training in Chandigarh
Sap bpc
Aakash Ambhore
Laptop repairing course delhi
Machine Learning , deep learning module imp
Introduction to Reinforcement Learning

Similar to Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku (20)

PPTX
Introduction to Reinforcement Learning - Code Heroku
PDF
An introduction to reinforcement learning
PDF
RL presentation
PPTX
Designing an AI that gains experience for absolute beginners
PDF
Reinforcement Learning Overview | Marco Del Pra
PPTX
An Introduction to Reinforcement Learning - The Doors to AGI
PDF
Intro rl
PPTX
Intro to Deep Reinforcement Learning
PPTX
Reinforcement course material samples: lecture 1
PDF
DRL 1 Course Introduction Reinforcement.ppt
PDF
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
PDF
Horizon: Deep Reinforcement Learning at Scale
PPTX
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
PDF
Shanghai deep learning meetup 4
PPTX
CS3013 -MACHINE LEARNING.pptx
PDF
22PCOAM16 Machine Learning Unit V Full notes & QB
PDF
Deep Q-Learning
PDF
Reinforcement Learning using OpenAI Gym
PDF
Reinforcement learning in a nutshell
PPTX
Introduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning - Code Heroku
An introduction to reinforcement learning
RL presentation
Designing an AI that gains experience for absolute beginners
Reinforcement Learning Overview | Marco Del Pra
An Introduction to Reinforcement Learning - The Doors to AGI
Intro rl
Intro to Deep Reinforcement Learning
Reinforcement course material samples: lecture 1
DRL 1 Course Introduction Reinforcement.ppt
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Horizon: Deep Reinforcement Learning at Scale
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Shanghai deep learning meetup 4
CS3013 -MACHINE LEARNING.pptx
22PCOAM16 Machine Learning Unit V Full notes & QB
Deep Q-Learning
Reinforcement Learning using OpenAI Gym
Reinforcement learning in a nutshell
Introduction to Reinforcement Learning.pptx
Ad

More from codeheroku (10)

PPTX
Introduction to Unsupervised Learning - Code Heroku
PPTX
Building a movie recommendation engine in Python using Scikit-Learn - Code He...
PPTX
Building Web Apps with Python Part 2 - Code Heroku
PPTX
Building Web Apps with Python - Code Heroku
PPTX
Introduction to Python - Code Heroku
PPTX
Introduction to Machine Learning - Code Heroku
PPTX
Introduction to Data Visualization Part 2 - Code Heroku
PPTX
Introduction to Data Visualization - Code Heroku
PPTX
Introduction to Computer Vision - Code Heroku
PPTX
Introduction to JavaScript - Code Heroku
Introduction to Unsupervised Learning - Code Heroku
Building a movie recommendation engine in Python using Scikit-Learn - Code He...
Building Web Apps with Python Part 2 - Code Heroku
Building Web Apps with Python - Code Heroku
Introduction to Python - Code Heroku
Introduction to Machine Learning - Code Heroku
Introduction to Data Visualization Part 2 - Code Heroku
Introduction to Data Visualization - Code Heroku
Introduction to Computer Vision - Code Heroku
Introduction to JavaScript - Code Heroku
Ad

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Tartificialntelligence_presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
OMC Textile Division Presentation 2021.pptx
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
Programs and apps: productivity, graphics, security and other tools
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf

Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

Editor's Notes

  • #3: In general we saw that RL deals with Making Decisions under uncertainty which core to understand intelligence and simulate it RL also deals with sequence of actions
  • #4: Often see a huge gap in the therotical approach which is taught in universities and practical implementations. In this entire course if you have noticed we are trying the bridge that gap
  • #5: Y= F(X) F(X) What Happens when we do not know the consequence for our immediate actions Contrast with Supervised ML Delayed Rewards / Sparse Signal RL deals with uncertaininty in envrionments / actions /observations
  • #15: Good Rewards – Conversational agent, Treatment pathway for patients