June 07, 17
Jean-François F Fortin Graphics Expert, Field Engineer, Unity Technologies firstname.lastname@example.org
Starting Point • I started at Unity recently and I was looking for ideas to learn to use the engine. • Worked on many projects where we where limited in draw calls. • Inspired by the work I’ve done for the Shinra Technologies ( ) cloud gaming platform. More specifically: • - Engine architecture focusing on “drawing many things”. • - “Living World” demo.
Shinra Technology’s Living World Demo
Starting Point • “Data oriented designs” to work well both on the CPU and GPU. • CPU was potentially expensive as it was required to do specific tasks. • GPU is cheap as work could be shared between multiple players. • Same ideas can be adapted to current games: • - Games can often be limited by the CPU. • - GPU can often execute more work.
Starting Point • What could I bring from this within unity as a learning project? • I’ll take you through the mind of a graphics programmer, through my experiments and thought process to optimize instancing.
Why Instancing? • Game performance is currently usually limited by the CPU. • The world is filled with things, games usually look empty by comparison. • Examples: • - Dense forest with many species of trees, cities filled with buildings. • - Real world feels alive filled with different animals or people. • Problems: • - CPU not as powerful as GPUs. • - Complex and dense scenes means lots of work on the CPU. • - Most of the scene traversal is not GPU friendly.
First Steps… Learning Unity Instancing • Instancing is typically used to render identical objects: • - Same mesh. • - Same material. • - No animations. • Helps to reduce the CPU usage as objects are grouped together and less draw calls needs to be issued. • Available on most platforms.
First Steps… Learning Unity Instancing To enable instancing: 1. Create a new material. 2. Check “Enable Instancing”. 3. Done.
Demo: Unity’s GPU Instancing
First Steps… Per-Instance Data • Custom Shader example:
First Steps… Per-Instance Data • Custom Shader example:
First Steps… Per-Instance Data • MaterialPropertyBlock
Analysis • Renders faster than individual instances but still slow. • Time spent on the CPU processing the scene. • Solution? • - Remove all the objects from the scene! • - Literally!
GPU Instancing using Graphics.DrawMeshInstancedIndirect(…)
Walking… Use scripts to tweak rendering 1. Remove objects (only need to disable the mesh renderers) from the scene. 2. Create MaterialPropertyBlock to include any instance data. 3. Render instances using Graphics.DrawMeshInstancedIndirect(…) which is a new addition to Unity 5.6.
Walking… Use scripts to tweak rendering
Analysis • Better performance on the CPU. • Can usually render more instances of the same model as the previous test. • GPU starts to have trouble when using more complex models. • Solution? • - GPU should do the visibility testing. • - Feed the result into the Graphics.DrawMeshInstancedIndirect(…) call.
Running… Indirect calls • Same as the regular calls in concept. Could in fact implement the functionality of the regular call using them. • Difference? • - Takes the draw call parameters from a GPU buffer. • - See Graphics.DrawProceduralIndirect • - See Graphics.DrawMeshInstancedIndirect • - See ComputeShader.DispatchIndirect
Running… Indirect calls • Very useful to link work done on compute shader with regular rendering. • No need to fetch the results back on CPU… • - This could potentially have a huge latency issues… • This enables the compute shaders to write the draw arguments. • The GPU later reads from the buffer the draw arguments.
Running… Visibility Testing • It can be very simple and cheap to do visibility testing on the GPU. • Mostly dot products, and the GPU is fast at them. • Let’s build a data oriented version of the scene… • - A point cloud is the perfect structure for simple instances.
Running… Visibility Testing Steps: 1. Shader process the objects and filters the visible objects. 2. Visible objects gets added into a “VisibleList” to be rendered. 3. Counter from the VisibleList is then used to update the buffer with the draw arguments. 4. DrawMeshInstanceIndirect(…)
Running… Visibility Testing
Running… Instance Setup
Demo: Visibility testing
Analysis • Best performances so far. • Shader can be flexible on what can be rendered as instances • Can even support dynamic and animated instances.
Ideas to push this further…
Animated Data • Regular instancing won’t work with animated objects. • Skinning is often done on the CPU or as a separate pass. • - “Stream output” from geometry shader or compute shader. • - In both cases it is essentially the same as having separate models for each animated instances. • Could re-implement skinning in Vertex Shader and store the matrices into a buffer like we did for our other parameters… • - This is a lot of data and could require a lot of VRAM. • - Not straightforward to implement as the required information is not easy to get.
Animated Data Solution: 1. Bake animations as vertex animations and store the data into textures. 2. Set the animation texture as a property on the material. 3. Update the frame number and store along the other instance data.
Animated Data: Baking
Animated Data: Vertex Shader
Animated Data: Binding on Material
Moving Objects • How could we extend this to support birds, little animals, etc. • Compute shader that updates the data structure. • Can have objects moving in the scene without hurting performance much. 1. Create a buffer containing “update commands”. 2. Compute Shader to process individual commands. 1. -> Update object #103 to position [100, 10, 500].
LODs and Billboards • Use the shader to do the calculations to get which LOD to use. • Create multiple draw calls from the buffer arguments (IndirectArgs) one for each LOD. • Billboards are essentially just a separate LOD.
Conclusions • Unity instancing options are varied and work well. • Can be improved using new features such as the indirect calls. • Can go around limitations of instancing by using shader tricks.