This is a four-part blog post series on the NVIDIA Graphics Processing Unit (GPU) Technology Conference (GTC).
The GTC runs on Pacific Daylight Time (PDT) so converted into NZDT the conference starts at approx. 3 am each day (for applicable sessions). The first session for me was on Tuesday 22nd March at 3 am. In NZDT the conference runs from Tuesday 22nd to Friday 25th 2022.
Intro to "Learning Deep Learning"
by Magnus Ekman
3:00 AM - 5:00 AM NZDT
This session took a brief dive into Deep Learning (DL) and was an active/practical course. Aside from perhaps taking some notes, you'd do a little bit of programming too, if you weren't quite fluent with Python you could just read along and Ekman would eventually show how to do things. This course was half promotion for Ekman's book Learning Deep Learning, and half applying what was in the book. It was a fast-paced class meaning if you missed one thing you could end up lost or if you could connect with the assets. The course had a capacity limit, and so if you were late your chances of entering were very low. Unfortunately, part way through my computer crashed and kicked me from the room. So at 4:15 am (45 minutes before the next session I wanted to attend), I had to sit and wait.
A note explaining normalization from another student in the chat I found helpful and was able to get down before my computer crashed:
Normalization is that inputs can be of different units, on very different scales. For example, the weight of a human could be 100lbs to 300lbs, whereas the height of a human could range from ~50in to 80in. If you didn't normalize these two input categories (pounds and inches) the neural network model would more heavily emphasize the human weight since it is a higher range of values than height in inches. therefore you need to normalize the inputs.
While this is very interesting I was quite lost as I have no Python experience and a major lack thereof for game programming too. It was a shame I couldn't continue to attend the class but Ekman's book can be found on the Pearson website.
Making a Connector for Omniverse
by Lou Rohan
5:00 AM - 5:50 AM NZDT
To connect Omniverse to apps via servers, NVIDIA has created the Omniverse Nucleus. In this talk, they use the Unreal Engine (UE(4/5), Unreal) plugin for Nucleus as an example and how it runs in real-time. For Unreal, there is a bi-directional connector. This means that real-time updates are shared between apps. There are three other connector types, uni-directional (real-time updates are reflected in Omniverse but not shared back), 'Export and Import' (conversion to USD via direct import or third-party app), and OmniDrive (similar to uni-directional but requires no connector). For both bi and uni-directional connectors, material conversion is supported, the 'Export/Import' has materials converted manually, and Omnidrive allows for USD or texture exports. It uses the USD (Universal Scene Description) file format to send between apps but can be converted back and forth between certain other formats (fbx, obj, etc.). This is mainly used for the models, materials, stage items, and animations. MDL (Material Definition Language) will be used for materials as well.
Omniverse uses both C++ and Python, and both Windows and Linux are supported.
The OmniCLI is the Omniverse Client Library and is great for moving files.
If an app already uses USD there can be conflicts but they have a work-around for the short-term.
This talk is actually from last year's November GTC. Below are some screenshots of what connectors there are, Omniverse URLs, and two versions of exporting (shared and modular).
Acceleration Structures in Ray Traced Open World Games
by Peter Morley
7:00 AM - 7:25 AM NZDT
This session jumped right into the deep end, so much so that it is hard for me to remember how it actually started. My notes dive right in too, starting with acronyms and definitions.
AS - Acceleration Structure
BVH - Bounding Volume Hierarchy
BLAS - Bottom Level AS
TLAS - Top Level AS
Compaction - Defragmenting the AS to reduce memory size
They go into what Acceleration Structures are/what they do. So, Acceleration Structures partition a scene spatially, a BLAS is an individual geometry, TLAS is an instance of a geometry, a TLAS references BLASes, both are constructed on GPU, and both create a two-level scene graph.
What characterizes an open world game:
- Zero interruption of loading screens
- LoDs to reduce triangle counts
- Streaming data to the GPU on demand
- Prevent overallocation of GPU Memory
Some Acceleration challenges include memory growth as more objects are in view, BLASes are built per frame, there is LoD management, BLAS build time scales linearly and the ray traversal scales logarithmically.* The caveat, however, is that different hardware vendors have different memory costs for TLAS and BLAS.
Ways to reduce memory and improve performance:
- Consider merging tiny BLASes into a single larger BLAS when it makes sense
- Do not create individual BLASes based on materials but rather based on spatial locality
- Partition the TLAS space and build up the geometries into larger BLASes.
*I will not lie, I understood about as much of this info as I could throw. It is probably best saved for if/when I go to create an open world game, but that will not be for a long time yet*
Some Key Takeaways are:
- Acceleration structure buffers take up nearly as much GPU memory as vertex buffers
- Acceleration structure builds consume GPU cycles so avoid them as much as possible between frames
- Organize acceleration structures spatially and not necessarily based on materials
- Converting a scene graph into the TLAS/BLAS two-level hierarchy isn't trivial and takes special considerations
- Cache prebuilt BLAS in system memory so you only need to copy them when referenced again
Questions asked/answered:
Q: How does BLAS compaction help to improve open-world game performance? A: Compaction reduces the total memory size of BLASes and thus helps to prevent overallocation of GPU memory which leads to performance degradation. Q: "What triggers the need to rebuild a BLAS?" A: The idea is that highly deformable geometry such as particle physics needs to be rebuilt every couple frames in order to not degrade traversal performance. The TLAS is built every frame as another example to prevent overlapping instances. Q: In the example where building the acceleration structure takes 5 ms, and doing rt shadows takes 2 ms, how long would the rt shadows take without the acceleration structure? A: You need to build the acceleration structures in order to do the traversal. The point here is that we want to balance build times and ray traversal times to be proportional. Q: Is it a good idea to load all LOD levels in the BLAS and just update the TLAS selecting the correct LOD each frame? A: That is fine but if you are memory constrained then loading a low-resolution LoD versus loading all of them can save quite a bit of memory. Q: Is compacting an automatic process or needs to be carried out manually? A: Compaction needs to be carried out manually. It is an operation that happens after the build is complete and involves copying to the smaller compacted memory allocation.
Something that was noted is that Unreal Engine has many ray tracing capabilities that include an RTXGI plugin. Unfortunately, you would need to have or use an RTX graphics card to be able to access most of these programs.
Bringing GPU-accelerated Computing and AI to the Classroom
by Andy Cook and Joe Bungo
9:00 AM - 9:25 AM NZDT
This session mainly covered what courses and kits are available for Deep Learning and AI from NVIDIA. There is a range of courses available for educators to sign up to both for free and/or for a charge, and students would access it via their professors or the GTC itself. There are free courses throughout the year for students, the next one being in April for students in Europe, the Middle East, and Africa. Even non-students can attend these workshops/courses as there are self-paced or instructor-led courses online.
How to Design Collaborative AR and VR worlds in Omniverse
by Omer Shapira
10:00 AM - 10:50 AM NZDT
First and foremost, OmniverseXR is coming out in April 2022. It is a VR Client for Omniverse, has a fully raytraced VR renderer, there is a VR extension for all Omniverse apps, and has a VR Python SDK.
Although it was the last session of the day, it was the first one to go over USD- what it is and how it's being used by NVIDIA- an image is below.
Just before explaining what USD is and how it's used by Omniverse, Shapira discussed the standard game engine "Waterfall" philosophy where every step is isolated. It can be seen in the image below as it compares what Shapira moves onto after the USD slide, the Hyperscale philosophy.
Features from OmniverseXR include..
Effortless raytraced VR:
- Same RTX renderer as the rest of the Omniverse Kit
- Log(n) render complexity; Limited only by VRAM
- Work directly on production models
- Use of Hydra instancing: 70M -> 10B triangles in this scene (from the example)
Making Things Fast:
- High-performance USD is an active development topic at NVIDIA
- MultiGPU supported out of the box
- Warped (continuous) foveated rendering - reduces drawn pixels by 70%**
** Bringing Ray Traced Visualization to Collaborative Workflows by Jeroen Stinstra and Kevin Parker from GTC Fall 2020.
The VR support for OmniverseVR is SteamVR.
Foveation - rendering more pixels where it matters, where the fovea rest.
Because of ray tracing, it has come far from its original.***
***I'm not sure what the context for 'it' was, but I believe it would have been OmniverseVR, if not, it would have probably had to do with foveation in VR.
Questions asked/Answered:
What headsets will Omniverse XR support? A: Initial support is for SteamVR - and we test with the most popular SteamVR HMDs. What is the relationship with CloudXR? A: CloudXR is a streaming protocol for AR and VR devices. We use CloudXR to stream to tablets; It's on our roadmap to support CloudXR for HMDs as well. Since USD models can have different units, how can I make sure I'm importing something in the appropriate size? A: We have a "Scaling Factor Override" option in Omniverse XR's "Advanced Settings" section. It changes the size of the *human* participant in the session - so your models are the correct size they were imported in. If you want to change a specific object's "Units To Meters" factor, the "Layer Settings" metadata panel will allow you to do that - per layer or stage. Is there a limit on the number of (VR) users interacting with the same Omniverse XR scene? A: We haven't reached one yet :) Does OmniverseXR work with Oculus quest 1/2? And do you need to make a build in order to access it or can it be streamed like the Oculus link technology? A: We did test it with Oculus Quest 2!
This session had a Post-Session Hangout on discord, where they continued taking questions in the NVIDIA Omniverse discord. Things that were mentioned:
- No other available ray tracing app for VR right now
- Baked lighting helps for engine-based VR and makes it feel better.
- It immediately has the quality you want
- "If you have an RTX card you now have ray tracing in VR for free."
- Latency and Bandwidth are things people are sensitive to in VR but it's not just VR
- Expanding on Foveation: it has a smooth warping function on one canvas rather than on nine canvases that have slight incremental differences
- The collaborative aspect of Omniverse:
Consensus - Google docs are the usual practice.
SDK Layer - no limitations on edits.
Design - is per application, everyone contributes something big, you can block out sections/lock them so that only [you] may enter and change that space.
Roles - can be harder to do
Different Collaboration Addons - could help and start an organizational culture
A wow moment for Kevin Parker was looking around the space and seeing the soft shadows that made it feel like the object was actually there.
And that concludes Tuesday's GTC sessions. The last session was probably by far my favorite as it touched on things I know a little bit more about but also have an interest in at this stage. From Wednesday onwards both the GTC and Game Developer's Conference (GDC) are on at the same time, there are clashes between sessions, but thankfully I have two devices to run them from and a limited access pass from GDC.
Comentários