SlideShare a Scribd company logo
6
Most read
12
Most read
18
Most read
Information Classification: General
December 8-10, 2020 | Virtual Event
Architectural Exploration for AI / ML accelerators
Simon Davidmann, Duncan Graham
Imperas Software
info@imperas.com
#RISCVSUMMIT
mp
Information Classification: General
Architectural Exploration for AI and ML accelerators 2
Machine Intelligence compute requirement
is growing fast
300,000x increase… https://openai.com/blog/ai-and-compute/
Information Classification: General
Architectural Exploration for AI and ML accelerators 3
35 Years of microprocessor trend data
Even though there are more transistors, don’t get performance gain, trend is to move to parallel / more cores
Information Classification: General
Architectural Exploration for AI and ML accelerators 4
Computation needed for AI / ML
Summary:
• e.g. 1 Billion MACs for AlexNet – image recognition… training
• X86 is not getting faster
• So trend is to move to special processing and run in parallel
=>
• So you need the fastest cores (with often custom extension / acceleration)
• And, it needs to be the correct parallel…
• And, designers need to know that their algorithms run “well” on the configuration of hardware
they select
Information Classification: General
Architectural Exploration for AI and ML accelerators 5
Processor Hardware options for
Software acceleration
• Dedicated external accelerator hardware
• Fast for the limited set of know use cases
• but inflexible if software needs change
• Processor extension
• Closely coupled gives efficiency with
flexibility
• but future improvements limited by End of
Moore’s Law
• Processor custom extension
• Performance advantages with optimized
instructions
• and lightweight inter-processor
communications for scale
Scalar processors
with vector extensions
CPU
Vector Extensions
Vector processors with
Instruction extensions
plus micro-arch coms
CPU
Vector Extensions
Custom Instructions
Comms Extensions
Accelerator
CPU
Scalar processors
with external accelerator
Information Classification: General
Architectural Exploration for AI and ML accelerators 6
AI SoC Architecture Exploration
Scalar processors
with vector extensions
Vector processors with
Instruction extensions
Vector processors with
Instruction extensions
plus micro-arch coms
CPU
Vector Extensions
CPU
Vector Extensions
DL Extensions
Comms Extensions
CPU
Vector Extensions
DL Extensions
Array of Processing Elements (PE)
AI & Machine Learning Accelerators
• Datacenter: training & inference
• Edge: inference (mostly)
• Compute arrays with processor
elements (PE) configured for
- Scalar
- Vector
- Spatial
- Communications
- PE <–> PE & PE <-> NoC
CPU
CPU
CPU
CPU
CPU
CPU
CPU CPU
CPU
CPU
CPU
CPU
Accelerator
CPU
Configurations of
Processing Elements (PE)
CPU Features of Processing Elements (PE)
Information Classification: General
Architectural Exploration for AI and ML accelerators 7
Imperas works with the leaders for
RISC-V Vector Extensions
• Andes certifies Imperas models and simulator as reference for new Andes RISC-V Vectors Core
with lead customers and partners
• Imperas code morphing simulation technology, virtual platforms and tools used by lead
customers for early software development and high-level architectural exploration
"Andes has announced the new RISC-V family 27-series
cores, which in addition to new and advanced features,
include the new Vector extensions that are an ideal solution
for our customers working on leading edge design for AI and
ML. Andes is pleased to certify the Imperas model and
simulator as a reference for the new Vector processor
NX27V, and is already actively used by our mutual
customers."
Charlie Hong-Men Su, CTO and Executive Vice President at
Andes Technology Corp
Taking RISC-V® Mainstream
9
NX27V VPU Overview
VPU: Vector Processing Unit
RVV spec: ongoing 0.8
Data formats:
SEW supported: int8, int16, int32, fp16, fp32
Extension formats: bfloat16 and int4
Support LMUL 1, 2, 4, 8
VPU main configurations:
SIMD width and VLEN (bits): 128, 256, and 512
Functional units chainable, with dedicated IQ, most fully pipelined
Wide system bus for data accesses
Vector Registers as operands for ACE instructions
Usage example: custom vector load/store from a dedicated memory port
Verification: leverage/enhance Google UVM, working with Imperas
Information Classification: General
Architectural Exploration for AI and ML accelerators 8
Example US Customer
• Customer project
• Full AI / ML engine
• 150+ CPU cores
• Over half with RISC-V Vector extension engine
• Imperas Reference Models and Virtual Platform provides environment for software stack development
• Simulation runs of software stack running in virtual platform take ~ 2hrs @ 500MIPS
• Cross compiled software running on simulated CPUs
• Allows hardware platform configuration, re-configuration, architectural changes
• Explore performance options
• Runs real software (production binaries) – can see how it interacts with HW configuration
• Running in Imperas more than a year before RTL commit
• Customer has SW and is looking to design HW to make it work the way they want…
• Also a by-product: kick-start SoC process by feeding models into HW DV at start
Information Classification: General
Architectural Exploration for AI and ML accelerators 9
Example
Japanese partner
• Overview
• Platform : ARM Cortex-A57 x 1 + RISC-V RV64GCV x 17
• Application1 : AlexNet image recognition deep neural network
Information Classification: General
Architectural Exploration for AI and ML accelerators 10
Imagenet with AlexNet deep neural network
• AlexNet (University of Toronto, 2012)
• https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96
• Hyper parameters
• Number of Parameter : 58 M (float32)
• Computation cost : 1,000 M (Number of multiply-add)
Information Classification: General
Architectural Exploration for AI and ML accelerators 11
Parallelization for multiple core
0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
300,000,000
350,000,000
400,000,000
450,000,000
convolution convolutionconvolutionconvolutionconvolution fully
connection
fully
connection
fully
connection
conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8
Number
of
muliply-add
mult-add local response normalization
Convolution layers have a lot of calculation
Parallelized these layers to use 16 CPU cores
Information Classification: General
Architectural Exploration for AI and ML accelerators 12
Simulate a Virtual Platform model
UART0
(for ARM[0])
UART1
(for RV64[0])
UART2
(for RV64[1])
UART3
(for RV64[2])
UART17
(for RV64[16])
RAM
ARM
Cortex-A57 [0]
RISC-V
RV64GC [0]
RISC-V
RV64GC [1]
RISC-V
RV64GC [16]
RAM
RAM Bus bridge
Bus bridge
ARM bus RISC-V bus
shared bus
…
…
Information Classification: General
Architectural Exploration for AI and ML accelerators 13
Executing simulation – different consoles
Information Classification: General
Architectural Exploration for AI and ML accelerators 14
Single Multi-Processor Debug
Debugging both ARM & RISC-V cores using one debugger at same time.
aarch64 register set
RV64 register set
Information Classification: General
Architectural Exploration for AI and ML accelerators 15
Example
Japanese partner
• Overview
• Platform : ARM Cortex-A57 x 1 + RISC-V RV64GCV x 17
• Application1 : AlexNet image recognition deep neural network
• Keypoints
• “Imperas simulator can simulate heterogeneous virtual platform”
• “Imperas also provides dedicated debugger which can debug hetero-system (ex.
ARM and RISC-V) using one debugger at same time”
• “Very fast. This example runs (at most) 10 times slower than native x64 execution
on host PC”
Information Classification: General
Architectural Exploration for AI and ML accelerators 16
How is Processor Performance Optimized?
• Move to multicore and to different multicore configurations
• Tune accelerators, configuration options (e.g. vector engine sizes)
• Optimize the pipeline
• Improve memory usage/latency
• Custom instructions for application/domain optimization (feature of RISC-V)
Information Classification: General
Architectural Exploration for AI and ML accelerators 17
Flow to add new custom instructions
• Instruction Accurate Simulation
• Trace / Debug
• Timing Simulation
• Function Timing / Profiling
Characterize C Application
• Design Instructions
• Add to Application
• Add to Model
• Add Timing
Develop New Custom
Instructions
• Instruction Accurate Simulation
• Trace / Debug
• Timing Simulation
• Function Timing / Profiling
Characterize New
Instructions in Application
• Instruction Coverage
• Line Coverage
• Instruction Performance
• Generate PDF model doc
Optimize & Document model
• Check RISC-V Compliance
• Use as reference for RTL Design Verification
• Use in Imperas/OVP Platforms, EPKs
• Heterogeneous / Homogeneous
• Multi-core, Many-core
• Imperas Multi-Processor Debug, VAP tools
• Port OS, RTOS (Linux, FreeRTOS…)
• Use in many simulation envs (inc. SystemC)
• Deliver to end users
Release & Deploy
Information Classification: General
Architectural Exploration for AI and ML accelerators 18
Demo walkthrough
Information Classification: General
Architectural Exploration for AI and ML accelerators 19
Imperas Tools / Environment
SlipStreamer API
Application Software
& Operating System
T
E
S
T
B
E
N
C
H
Virtual Platform
Memory
Peripheral
OVP
CPU
OVP
CPU
Verification, Analysis &
Profiling (VAP) tools
• Trace
• Profile
• Code coverage
• Memory monitor
• Protocol checker
• Assertion checkers
JIT simulator engine
Multiprocessor /
Multicore
Debugger
Eclipse IDE
• OS task tracing
• OS scheduler analysis
• Fault injection
• Function tracing
• Variable tracing
• …
B
U
S
Information Classification: General
Architectural Exploration for AI and ML accelerators 20
Imperas works with Mellanox on
RISC-V Processor Verification
• Imperas Leading RISC-V CPU Reference Model for Hardware Design Verification Selected
by Mellanox/NVIDIA
• Verification tools and golden reference model provide support for RISC-V custom
instruction extensions and full processor design verification
Information Classification: General
Architectural Exploration for AI and ML accelerators 21
Summary
• Current AI / ML applications need new / custom configurations of hardware to obtain the required
performance goals
• Fast simulation allows software to run on virtual platforms many months (maybe a year) before RTL
commit
• Imperas allows analysis of performance on different hardware configuration choices
• including running heterogeneous platforms with full OS running
• provides detailed analysis, profiling, performance and debug tooling
• Imperas Reference Model includes all the current RISC-V specification features and enables you to
develop custom instructions
• Is a golden reference for many users validating their silicon
• Imperas provides solutions to enable architectural Exploration for AI and ML accelerators
Information Classification: General
Architectural Exploration for AI and ML accelerators 22
More Information: info@imperas.com
• Stop by the virtual Imperas booth at the December 2020 RISC-V Summit
Summit
• www.Imperas.com
• www.OVPworld.org
• www.GitHub.com/riscv-ovpsim

More Related Content

PPTX
AI Hardware Landscape 2021
PDF
Andes enhancing verification coverage for risc v vector extension using riscv-dv
PDF
Pcie basic
PPTX
An AI accelerator ASIC architecture
PDF
FPGA Hardware Accelerator for Machine Learning
PDF
Andes open cl for RISC-V
PPTX
Riscv 20160507-patterson
PPT
Intel Core i7 Processors
AI Hardware Landscape 2021
Andes enhancing verification coverage for risc v vector extension using riscv-dv
Pcie basic
An AI accelerator ASIC architecture
FPGA Hardware Accelerator for Machine Learning
Andes open cl for RISC-V
Riscv 20160507-patterson
Intel Core i7 Processors

What's hot (20)

PDF
Architecture of TPU, GPU and CPU
PPTX
Slideshare - PCIe
PDF
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
PPTX
RISC-V-Introduction-_-Aug-2021.pptx
PPTX
RISC-V: The Open Era of Computing
PDF
FPGA / SOC teknologi - i dag og i fremtiden
PDF
Board Bringup
PPT
VHDL - Enumerated Types (Part 3)
PDF
AMD Ryzen Pro
PDF
AI Chip Trends and Forecast
PDF
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
PPTX
DPDK KNI interface
PPTX
AXI Protocol.pptx
PDF
CPU vs. GPU presentation
PPTX
RISC-V Introduction
PDF
System-on-Chip Design, Embedded System Design Challenges
PPTX
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
PPTX
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
PPT
Pcie drivers basics
PPTX
3D V-Cache
 
Architecture of TPU, GPU and CPU
Slideshare - PCIe
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
RISC-V-Introduction-_-Aug-2021.pptx
RISC-V: The Open Era of Computing
FPGA / SOC teknologi - i dag og i fremtiden
Board Bringup
VHDL - Enumerated Types (Part 3)
AMD Ryzen Pro
AI Chip Trends and Forecast
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
DPDK KNI interface
AXI Protocol.pptx
CPU vs. GPU presentation
RISC-V Introduction
System-on-Chip Design, Embedded System Design Challenges
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
Pcie drivers basics
3D V-Cache
 
Ad

Similar to RISC-V & SoC Architectural Exploration for AI and ML Accelerators (20)

PDF
E3MV - Embedded Vision - Sundance
PDF
Digital Security by Design: Imperas’ Interests - Simon Davidmann, Imperas Sof...
 
PPTX
Introduction to architecture exploration
PDF
Beyond Moore's Law: The Challenge of Heterogeneous Compute & Memory Systems
PDF
AI Crash Course- Supercomputing
PDF
06 EPI: the European approach for Exascale ages
PPTX
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
PPTX
Industrial trends in heterogeneous and esoteric compute
PPTX
Mirabilis_Design AMD Versal System-Level IP Library
PDF
Deep learning: Hardware Landscape
PPTX
Literature Summary.pptx
PDF
DRAC: Designing RISC-V-based Accelerators for next generation Computers
PDF
NNSA Explorations: ARM for Supercomputing
PDF
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
PDF
FPGAs for Supercomputing: The Why and How
PDF
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
 
PDF
Implementing AI: Running AI at the Edge
 
PDF
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
PDF
01 AAA SoC Prototyping Oct2024P - Future of AI.pdf
PPTX
Cross platform computer vision optimization
E3MV - Embedded Vision - Sundance
Digital Security by Design: Imperas’ Interests - Simon Davidmann, Imperas Sof...
 
Introduction to architecture exploration
Beyond Moore's Law: The Challenge of Heterogeneous Compute & Memory Systems
AI Crash Course- Supercomputing
06 EPI: the European approach for Exascale ages
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Industrial trends in heterogeneous and esoteric compute
Mirabilis_Design AMD Versal System-Level IP Library
Deep learning: Hardware Landscape
Literature Summary.pptx
DRAC: Designing RISC-V-based Accelerators for next generation Computers
NNSA Explorations: ARM for Supercomputing
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
FPGAs for Supercomputing: The Why and How
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
 
Implementing AI: Running AI at the Edge
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
01 AAA SoC Prototyping Oct2024P - Future of AI.pdf
Cross platform computer vision optimization
Ad

More from RISC-V International (20)

PDF
WD RISC-V inliner work effort
PDF
RISC-V Zce Extension
PDF
RISC-V Online Tutor
PPTX
London Open Source Meetup for RISC-V
PPTX
RISC-V Introduction
PPTX
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
PPTX
Static partitioning virtualization on RISC-V
PDF
Standardizing the tee with global platform and RISC-V
PDF
Semi dynamics high bandwidth vector capable RISC-V cores
PPTX
Security and functional safety
PPTX
Reverse Engineering of Rocket Chip
PPTX
RISC-V NOEL-V - A new high performance RISC-V Processor Family
PPTX
RISC-V 30910 kassem_ summit 2020 - so_c_gen
PDF
RISC-V 30908 patra
PPTX
RISC-V 30907 summit 2020 joint picocom_mentor
PPTX
RISC-V 30906 hex five multi_zone iot firmware
PPTX
RISC-V 30946 manuel_offenberg_v3_notes
PDF
RISC-V software state of the union
PDF
Ripes tracking computer architecture throught visual and interactive simula...
PPTX
Porting tock to open titan
WD RISC-V inliner work effort
RISC-V Zce Extension
RISC-V Online Tutor
London Open Source Meetup for RISC-V
RISC-V Introduction
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Static partitioning virtualization on RISC-V
Standardizing the tee with global platform and RISC-V
Semi dynamics high bandwidth vector capable RISC-V cores
Security and functional safety
Reverse Engineering of Rocket Chip
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30908 patra
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V software state of the union
Ripes tracking computer architecture throught visual and interactive simula...
Porting tock to open titan

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Machine Learning_overview_presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
A comparative study of natural language inference in Swahili using monolingua...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Mushroom cultivation and it's methods.pdf
Machine learning based COVID-19 study performance prediction
Accuracy of neural networks in brain wave diagnosis of schizophrenia

RISC-V & SoC Architectural Exploration for AI and ML Accelerators

  • 1. Information Classification: General December 8-10, 2020 | Virtual Event Architectural Exploration for AI / ML accelerators Simon Davidmann, Duncan Graham Imperas Software [email protected] #RISCVSUMMIT mp
  • 2. Information Classification: General Architectural Exploration for AI and ML accelerators 2 Machine Intelligence compute requirement is growing fast 300,000x increase… https://openai.com/blog/ai-and-compute/
  • 3. Information Classification: General Architectural Exploration for AI and ML accelerators 3 35 Years of microprocessor trend data Even though there are more transistors, don’t get performance gain, trend is to move to parallel / more cores
  • 4. Information Classification: General Architectural Exploration for AI and ML accelerators 4 Computation needed for AI / ML Summary: • e.g. 1 Billion MACs for AlexNet – image recognition… training • X86 is not getting faster • So trend is to move to special processing and run in parallel => • So you need the fastest cores (with often custom extension / acceleration) • And, it needs to be the correct parallel… • And, designers need to know that their algorithms run “well” on the configuration of hardware they select
  • 5. Information Classification: General Architectural Exploration for AI and ML accelerators 5 Processor Hardware options for Software acceleration • Dedicated external accelerator hardware • Fast for the limited set of know use cases • but inflexible if software needs change • Processor extension • Closely coupled gives efficiency with flexibility • but future improvements limited by End of Moore’s Law • Processor custom extension • Performance advantages with optimized instructions • and lightweight inter-processor communications for scale Scalar processors with vector extensions CPU Vector Extensions Vector processors with Instruction extensions plus micro-arch coms CPU Vector Extensions Custom Instructions Comms Extensions Accelerator CPU Scalar processors with external accelerator
  • 6. Information Classification: General Architectural Exploration for AI and ML accelerators 6 AI SoC Architecture Exploration Scalar processors with vector extensions Vector processors with Instruction extensions Vector processors with Instruction extensions plus micro-arch coms CPU Vector Extensions CPU Vector Extensions DL Extensions Comms Extensions CPU Vector Extensions DL Extensions Array of Processing Elements (PE) AI & Machine Learning Accelerators • Datacenter: training & inference • Edge: inference (mostly) • Compute arrays with processor elements (PE) configured for - Scalar - Vector - Spatial - Communications - PE <–> PE & PE <-> NoC CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Accelerator CPU Configurations of Processing Elements (PE) CPU Features of Processing Elements (PE)
  • 7. Information Classification: General Architectural Exploration for AI and ML accelerators 7 Imperas works with the leaders for RISC-V Vector Extensions • Andes certifies Imperas models and simulator as reference for new Andes RISC-V Vectors Core with lead customers and partners • Imperas code morphing simulation technology, virtual platforms and tools used by lead customers for early software development and high-level architectural exploration "Andes has announced the new RISC-V family 27-series cores, which in addition to new and advanced features, include the new Vector extensions that are an ideal solution for our customers working on leading edge design for AI and ML. Andes is pleased to certify the Imperas model and simulator as a reference for the new Vector processor NX27V, and is already actively used by our mutual customers." Charlie Hong-Men Su, CTO and Executive Vice President at Andes Technology Corp Taking RISC-V® Mainstream 9 NX27V VPU Overview VPU: Vector Processing Unit RVV spec: ongoing 0.8 Data formats: SEW supported: int8, int16, int32, fp16, fp32 Extension formats: bfloat16 and int4 Support LMUL 1, 2, 4, 8 VPU main configurations: SIMD width and VLEN (bits): 128, 256, and 512 Functional units chainable, with dedicated IQ, most fully pipelined Wide system bus for data accesses Vector Registers as operands for ACE instructions Usage example: custom vector load/store from a dedicated memory port Verification: leverage/enhance Google UVM, working with Imperas
  • 8. Information Classification: General Architectural Exploration for AI and ML accelerators 8 Example US Customer • Customer project • Full AI / ML engine • 150+ CPU cores • Over half with RISC-V Vector extension engine • Imperas Reference Models and Virtual Platform provides environment for software stack development • Simulation runs of software stack running in virtual platform take ~ 2hrs @ 500MIPS • Cross compiled software running on simulated CPUs • Allows hardware platform configuration, re-configuration, architectural changes • Explore performance options • Runs real software (production binaries) – can see how it interacts with HW configuration • Running in Imperas more than a year before RTL commit • Customer has SW and is looking to design HW to make it work the way they want… • Also a by-product: kick-start SoC process by feeding models into HW DV at start
  • 9. Information Classification: General Architectural Exploration for AI and ML accelerators 9 Example Japanese partner • Overview • Platform : ARM Cortex-A57 x 1 + RISC-V RV64GCV x 17 • Application1 : AlexNet image recognition deep neural network
  • 10. Information Classification: General Architectural Exploration for AI and ML accelerators 10 Imagenet with AlexNet deep neural network • AlexNet (University of Toronto, 2012) • https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 • Hyper parameters • Number of Parameter : 58 M (float32) • Computation cost : 1,000 M (Number of multiply-add)
  • 11. Information Classification: General Architectural Exploration for AI and ML accelerators 11 Parallelization for multiple core 0 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 300,000,000 350,000,000 400,000,000 450,000,000 convolution convolutionconvolutionconvolutionconvolution fully connection fully connection fully connection conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8 Number of muliply-add mult-add local response normalization Convolution layers have a lot of calculation Parallelized these layers to use 16 CPU cores
  • 12. Information Classification: General Architectural Exploration for AI and ML accelerators 12 Simulate a Virtual Platform model UART0 (for ARM[0]) UART1 (for RV64[0]) UART2 (for RV64[1]) UART3 (for RV64[2]) UART17 (for RV64[16]) RAM ARM Cortex-A57 [0] RISC-V RV64GC [0] RISC-V RV64GC [1] RISC-V RV64GC [16] RAM RAM Bus bridge Bus bridge ARM bus RISC-V bus shared bus … …
  • 13. Information Classification: General Architectural Exploration for AI and ML accelerators 13 Executing simulation – different consoles
  • 14. Information Classification: General Architectural Exploration for AI and ML accelerators 14 Single Multi-Processor Debug Debugging both ARM & RISC-V cores using one debugger at same time. aarch64 register set RV64 register set
  • 15. Information Classification: General Architectural Exploration for AI and ML accelerators 15 Example Japanese partner • Overview • Platform : ARM Cortex-A57 x 1 + RISC-V RV64GCV x 17 • Application1 : AlexNet image recognition deep neural network • Keypoints • “Imperas simulator can simulate heterogeneous virtual platform” • “Imperas also provides dedicated debugger which can debug hetero-system (ex. ARM and RISC-V) using one debugger at same time” • “Very fast. This example runs (at most) 10 times slower than native x64 execution on host PC”
  • 16. Information Classification: General Architectural Exploration for AI and ML accelerators 16 How is Processor Performance Optimized? • Move to multicore and to different multicore configurations • Tune accelerators, configuration options (e.g. vector engine sizes) • Optimize the pipeline • Improve memory usage/latency • Custom instructions for application/domain optimization (feature of RISC-V)
  • 17. Information Classification: General Architectural Exploration for AI and ML accelerators 17 Flow to add new custom instructions • Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling Characterize C Application • Design Instructions • Add to Application • Add to Model • Add Timing Develop New Custom Instructions • Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling Characterize New Instructions in Application • Instruction Coverage • Line Coverage • Instruction Performance • Generate PDF model doc Optimize & Document model • Check RISC-V Compliance • Use as reference for RTL Design Verification • Use in Imperas/OVP Platforms, EPKs • Heterogeneous / Homogeneous • Multi-core, Many-core • Imperas Multi-Processor Debug, VAP tools • Port OS, RTOS (Linux, FreeRTOS…) • Use in many simulation envs (inc. SystemC) • Deliver to end users Release & Deploy
  • 18. Information Classification: General Architectural Exploration for AI and ML accelerators 18 Demo walkthrough
  • 19. Information Classification: General Architectural Exploration for AI and ML accelerators 19 Imperas Tools / Environment SlipStreamer API Application Software & Operating System T E S T B E N C H Virtual Platform Memory Peripheral OVP CPU OVP CPU Verification, Analysis & Profiling (VAP) tools • Trace • Profile • Code coverage • Memory monitor • Protocol checker • Assertion checkers JIT simulator engine Multiprocessor / Multicore Debugger Eclipse IDE • OS task tracing • OS scheduler analysis • Fault injection • Function tracing • Variable tracing • … B U S
  • 20. Information Classification: General Architectural Exploration for AI and ML accelerators 20 Imperas works with Mellanox on RISC-V Processor Verification • Imperas Leading RISC-V CPU Reference Model for Hardware Design Verification Selected by Mellanox/NVIDIA • Verification tools and golden reference model provide support for RISC-V custom instruction extensions and full processor design verification
  • 21. Information Classification: General Architectural Exploration for AI and ML accelerators 21 Summary • Current AI / ML applications need new / custom configurations of hardware to obtain the required performance goals • Fast simulation allows software to run on virtual platforms many months (maybe a year) before RTL commit • Imperas allows analysis of performance on different hardware configuration choices • including running heterogeneous platforms with full OS running • provides detailed analysis, profiling, performance and debug tooling • Imperas Reference Model includes all the current RISC-V specification features and enables you to develop custom instructions • Is a golden reference for many users validating their silicon • Imperas provides solutions to enable architectural Exploration for AI and ML accelerators
  • 22. Information Classification: General Architectural Exploration for AI and ML accelerators 22 More Information: [email protected] • Stop by the virtual Imperas booth at the December 2020 RISC-V Summit Summit • www.Imperas.com • www.OVPworld.org • www.GitHub.com/riscv-ovpsim