Vulkan introduction

Vulkan API
IEG 智能创新业务部 JON(家汉)

目的
 OpenGL和Vulkan这两个API设计的差异
 介绍Vulkan API的整个流程和用法加快你自学和读spec的速度
 了解Vulkan常见的误区

Why ?
 更多的任务从驱动程序移到开发者，提高行为一致性和优化的可能性。
 提供Loader & Layers的机制，不同Vendor驱动可以共存，可动态加载Validation和Debug的机
能。
 可同时操作多个GPU设备，由开发者分配不同的任务。
 高效能:
 驱动只做非常有限的error check，加载validation layer做spec的检查。
 重复使用相同的pipeline的结构 (state) vs. 重复地去做一系列的操作重建pipeline 。
 预录Command Buffers ，可重复使用，並支持多线程。
 自己管理资源绑定的内存空间，增加快取的效益，减少破碎和使用率。
 Id v.s. Pointer
 Vulkan相较于OpenGL 4.5没有更先进的绘图技术，只是将资源管理由驱动转移到App端。

SPIR-V (Standard Intermediate Language for
Parallel Compute and Graphics)

Host vs. Device
 Host: 是指CPU,透过Vulkan API做3D渲染和
运算任务的资源调度
 Device: 通常是指GPU,做3D渲染和运算的设
备
 Device Local Memory: GPU设备上的内存,又
称VRAM
Device
Host

Development Environments
 开发者下载 Vulkan SDK : https://vulkan.lunarg.com/
 SPIR-V生成工具
 Validation Layer
 vulkan.h
 消费者安装VulkanRT，但通常驱动会自己装。
 iOS和MacOS原生不支援Vulkan,但是MoltenVK包装原生的Metal 框架提供绘图和
运算功能。
 找一套支援Vulkan的Window框架，例如SDL 2.0。

Loader & Layers
 Android的Layer是安装在APK中.

Flow Create Instance
Query Physical Devices
Query Queue Families
Enable Layers
Create Logical
Devices & Queues
Graphics Pipeline Present
Command Buffers
Render Loop
Framebuffer
Swap Chain
SPIR-V
Descriptor Set
Render Pass

Create Instance
 启动Layers & Extensions

安装Validation Layer
 针对Spec做检查，非驱动回报错误 (Validity Error vs. Runtime Error)。
 https://vulkan.lunarg.com/doc/view/1.0.30.0/windows/layers.html

Debug Report
 让Validation Layers回
报讯息

选择Physical Device
 列举所有的physical device,查
询拥有的queue family,
features, extensions, clock和
内存大小,来决定使用哪一个
device.
 確保surface要求的extenstion
是支持的.
 確保device的swapchain是支
持surface的.

列举Queue Family
 在装置上执行command buffers的接口
 找出支持特定用途的Queue Family
 GRAPHICS_BIT
 COMPUTE_BIT
 TRANSFER_BIT
 Support Present

Create Logic Device
 Open device
 透过这个device去建立其他Vulkan物件
 建立queues
 GRAPHICS_BIT
 COMPUTE_BIT
 TRANSFER_BIT
 Support Present
 逻辑要能够支持上述功能是在同一个或是不同的Queue
Family.
 不同功能的queue用不同的指标去管理

Swapchain (1/2)
 Vulkan没有预设的Framebuffer，Swapchain就是一个图片数组，这些
图片等着被显示到Surface上。

Swapchain (2/2)
FIFOImmediate MailBox

Memory Types
 GPU Bulk Data
 DEVICE_LOCAL without HOST_VISIBLE
 Framebuffer, buffer & texture
 CPU-to-GPU Data Flow
 DEVICE_LOCAL with HOST_VISIBLE
 更新常数 (MVP matrix)
 GPU-to-CPU Data Flow
 HOST_VISIBLE with HOST_COHERENT and HOST_CACHED
 截图 & GPU加速
 Texture Upload
 建立一个CPU-to-GPU 的buffer,然后将pixels写入buffer中,然后建一个GPU Bulk的image,使用
transfer queue将buffer的资料copy到image,然后释放buffer.
 Nvidia的设备不支持DEVICE_LOCAL with HOST_VISIBLE

Memory Allocation (1/2)
 Memory offset要是是VkPhysicalDeviceLimits::minXxxBufferOffsetAlignment的倍数

Memory Allocation (2/2)
Easy to integrate Vulkan memory allocation library: Vulkan Memory Allocator

Render Pass
 定义多个render task的输出与之间的相依性
 Collection of..
 Attachments
 Subpass
 Dependency between Subpass
 情況允許,請使用OP_DONT_CARE在load/store flags

Graphics Pipeline
Graphics Pipeline
 渲染3D图形到image物件的内存中
 宣告大量的struct来定义整个graphics pipeline结构。
 创建新的graphics pipeline成本很高,要重复使用,或是使用cache
 可以将cache存入disk中,重启app后可以重复使用

Vertex Input
 C++ code to bind vertex & Its layout
 Shader
 C++ class
 Command
 描述Vertex data的layout.
 可支持interleave的格式

Mis.
 Vertex Assembly
 Viewport & Scissor
 Rasterization
 Blend

Shader Stage
 SPIR-V格式
 Vertex
 Fragment
 Compute
 Geometry
 Tessellation_control
 Tessellation_evaluat
ion

Descriptor Set
 Descriptor Set
 Shader  C++ Struct
 Descriptor Set Layout
 Uniform Buffer
 描述除了vertex data和
render target之外shader
绑定的buffer或image.

Descriptor Types
 Sampler是支持filter加上normalized coordination.
 Image不支持filter加上non-normalized coordination.
 Uniform的size上限很小 (数十KB),速度比storage快.
 Texel buffer是有color forma的buffer.
 Storage支持写入, storage buffer支持atomic add/min/max等操作 (uint or int types).

Offset and Stride Assignment
 Offset and Stride Assignment

Push Constant
 C++ array
 Shader
 Pipeline Layout
 Command
 小块的uniform data.

Uniform Buffer vs. Push Constant
 如何传送数据
 Uniform buffer是bind某个buffer的内存.
 Push Constant是copy资料到Command buffer中,在执行指令时写入GPU的特定内
存
 更新时机
 Uniform buffer是Host端随时可写入新的数据到buffer
 Push Constant只能透过Command buffer去更新
 使用限制
 Shader可以bind多的uniform
 Shader只有一个push_constant
 Push Constant速度会比Uniform buffer快

Command Buffers: 预录的指令，未来可以提交到queue中去执行
Spec:
Command buffer submissions to a
single queue respect submission order
and other implicit ordering guarantees,
but otherwise may overlap or execute
out of order.

Parallelization
 Multi-thread in CPU levels
 Multi-queue in GPU levels
 Multi-device…

Synchronization
 Fence
 Host等Device的Command Buffer執行完
 Semaphore
 不同queue之間互相等待
 Event
 Device等待Host或是同一個queue之內的指令

Pipeline Barrier
 保證下一個pipeline的某個
stage等待上一個pipeline的某
個stage完成.
 Pipeline之間必須要有相依姓.

Memory Barrier
 在Pipeline Barrier之外附加内存的数据同步.
 Global Memory Barrier
 Buffer Memory Barrier
 Image Memory Barrier
 Layout Optimization

Image Layout
 Create Image => VK_IMAGE_LAYOUT_PREINITIALIZED
 Transfer Data => VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
 Normal Texture => VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
 Framebuffer => VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
 Before Present => VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
 Before Acquire Next Image =>
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

Pipeline Statistics
 创建query pool
 归零
 结束收集
 开始收集
 取出结果
 统计某个指令区间内量化资料

Compute Shader
 ComputePipeline只有一个stage,只
需要指定DescriptorSet来绑定读写
的image或buffer.
 需要指定work group size和local
work size,建立work-item (线程) 数
量是两者相乘.
 同一个group的work-item共享同
一个compute shared memory
(local memory)

 Compute shared memory大小只有4KB左右,速度是global memory的10x~20x倍.
 Group work size的数量至少要大于compute unites的数量

 Shared memory不能有初始值,
通常会指定一个work-item做
初始化.
 Barrier()同步shader的执行流
程,也可以用在flow control中,
但只限制Dynamically uniform
expression.
 memoryBarrierShared()确保
shared memory的修改在同一
个work-group的work-item都
看的到.

观念澄清
 Physical Device的对象不一定是GPU,用GPU只是便于说明.
 在Android上host和local的内存是共享的.
 驱动的容错性非常高,开发时要打开Validation Layer确保代码逻辑是符合
spec且跨平台的.
 Validation Layer检查很严格,但也有可能会出错,请参考spec或去看GitHub上
的issue list.
 https://github.com/KhronosGroup/Vulkan-ValidationLayers

最佳化指引
 创建debug report callback时打开
VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT的flag
 CPU-bound
 先做prebaked,将任务做切割,极大化command buffer和graphics pipeline的重复使用次数,极小
化在render loop去创建新的command buffer和graphics pipeline的次数.
 使用多线程去创建多个command buffers.
 将CPU可使用做大量平行运算的算法移植到computing pipeline去做 (例: 物理运动&3D音效).
 GPU-bound (Vulkan相较于OpenGL 4.5优势较小)
 不同queue用不同的priority,将非实时的工作放到priority较低的queue.
 在独立的transfer queue做数据搬移的工作,和渲染任务同时并进 (semaphore做同步).
 相关物件所占用的内存是紧密排列.
 在Qualcomm平台可以使用Snapdragon Profiler去做分析

自学
 Vulkan Tutorial
 https://vulkan-tutorial.com/Introduction
 Vulkan Samples
 https://github.com/SaschaWillems/Vulkan
 GPU Info
 http://www.vulkan.gpuinfo.org/

參考
 新一代图形编程接口Vulkan简介（一）
 Siggraph 2016 - Vulkan and nvidia : the essentials
 Brief guide to Vulkan layers
 Vulkan Tutorial
 The Industry Open Standard Intermediate Language for
Parallel Compute and Graphics
 Vulkan Memory Management
 Vulkan Validation and Debugging Layers
 Leveraging asynchronous queues for concurrent execution
 API without Secrets: Introduction to Vulkan* Part 2: Swap
Chain
 BARRIERS IN VULKAN : THEY ARE NOT THAT DIFFICULT
 Vulkan Device Memory
 Vulkan Subgroup Tutorial
 http://www.vulkan.gpuinfo.org/

Vulkan introduction

More Related Content

What's hot (20)

Vulkan introduction

Editor's Notes