👨🏼‍💻 Exercises

m4::e0 and m4::e1 combined were used as a hand-in in the course version. m4::e0 is a gentle starting point for architectural analysis and also serves as a template which can serve as a starting point for projects. m2::e1 seeks to bridge the gap between what has been taught so far and what is in your own area of interest.

m4::e0 - Architectural Analysis

Describe the base architecture of the egui-winit-wgpu template. Found in m4_real_time_systems::code::egui-winit-wgpu-template or online.

Which elements are in play?
Who owns what data?
Think back to what you have learned in this and the previous module.
Use words and diagrams!

If you need some inspiration to get the ball rolling and interact with the code I have a few exercises you could try out:

Launch the app in GUI mode, use it to generate a config.toml file, then load that config file in nogui mode.
Replace the automatic rotation of the triangle with rotations based on keyboard input events
Could you delay the resize of the window until the window is done resizing?
Could you get only log messages from the user written parts of the code and not crates like wgpu?
Can you find out how many frames per second is rendered with the GUI running and without it (using --nogui)?

🧬 m4::e1 - Interpretation

Pick items worth a total of 3 points or more, write an interpretation of each item of at least 10 times the number of points lines. So an item worth 2 points requires a 20 line description.

Suggestions for things to talk about:

A description of the proposed solution
Which elements you have learned about in m1 and m2 are at play?
What performance implications result from the item?
What needs to be bottlenecked for this technique to be relevant (if it is an optimization technique)
What will likely be the bottleneck after this technique has been implemented?
What is the weakness of the method/design?
In which cases would the proposed method/design be less useful?

You don't need to be correct, in many cases you can't be without profiling. The point is the process of putting into words analysis from a systems programming perspective.

General

1 - Entity component systems - post 1, post 2
1 - Array-of-Structs, Structs-of-Arrays, Arrays-of-Structs-of-Arrays, Auto-Vectorization - blog post
1 - Branch Prediction
1 - Eytzinger Binary Search
2 - Custom memory allocators
2 - SIMD optimization

Deep Learning

1 - Data Distributed Parallelism - Post 1, Post 2
1 - Model Distributed Parallelism
1 - Optimizing Inference
2 - BAGUA: Scaling up Distributed Learning with System Relaxations
2 - Flash Attention
2 - Gyro Dropout, Reference 2
2 - JAX
2 - Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
2 - QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Computer Graphics

1 - multiresolution ambient occlusion
1 - Fast Hierarchical Culling
1 - Octree Textures on the GPU
2 - On Ray Reordering Techniques for Faster GPU Ray Tracing
2 - Mesh Compression
2 - Work Graphs in DX12
4 - Nanite - Video 1, Video 2, Video 3

Computer Vision

4 - ORB-SLAM - Paper 1, Paper 2, Paper 3, Simply Explained

🧬 m4::e2 - Group discussion and presentation

Pick one of the following topics.
Read and understand it, then present and discuss the topic with one or more other people.
You are encouraged to find additional litterature on your own.

Bit tricks, atomic operators, packing normals and colors
Morton codes / Z-order curves, tiling and GPU textures
PyTorch 2.0 Compiler
Graph Sampling
DLSS
Real-Time Texture Decompression and Upsampling, such as this
2:4 sparsity with Tensor Cores