Further readingWhile that’s all you need to overclock the Strix 2080, we’ve taken the time to take a deeper look at Turing and ray tracing. If you’re interested in either subject, the sections below provide a host of useful information.
NVIDIA Turing (TU104)The RTX 2080 (TU104) is made up of 13.6 billion TSMC 12nm FinFET transistors taking up 545mm2 of space. Despite not being the largest sibling in the Turing line up, that still topples NVIDIA's previous flagship 1080Ti (GP102) by 15%.
On closer inspection, we can see that Turing closely resembles the Volta architecture. A total of six processing clusters with 3072 combined CUDA cores do most of the work, pumping out 9.2 TFLOPs, compared with the GTX 1080’s 2560 cores delivering a throughput of 8.9 TFLOPs. For AI and Deep Learning, 384 Tensor Cores are divided over 48 Streaming Multiprocessors (SM), with two placed in each of the 23 Texture Processing Clusters (TPC). Every SM also contains an RT Core (Ray Tracing), intended to unburden the rest of the GPU from the immense computational power required to cast rays in real-time.

The Vbuffer gets an upgrade to 8GB of GDDR6 (from Micron), delivering higher bandwidth and a 40% reduction in signal crosstalk. This is fed by eight 32-bit memory controllers, delivering 448GB/s of bandwidth over a 256-bit bus. Despite having the same bus width as the GTX 1080, that's a 33% increase in throughput thanks to GDDR6 and the redesigned architecture. The Turing memory subsystem is now unified, allowing the L1 cache to be combined in a shared memory pool. Along with an increase in L1 and L2 cache size, this results in twice the bandwidth per Texture Processing Cluster in comparison to Pascal.

Other ImprovementsDeep Learning Super Sampling (DLSS)Legacy super sampling renders frames at a power of two or four times the native resolution and then down samples them to fit the display. However, the downside of this approach is that it requires a considerable amount of video memory and bandwidth, generating a large performance penalty. As a viable alternative, NVIDIA developed a temporal antialiasing method (TXAA) that uses a shader-based algorithm to sample pixels from current and previous frames. This approach yields mixed results because it blends multi-sampling over the top of temporal accumulation, rendering a soft or blurred image.
On Turing, a technique referred to as DLSS (Deep Learning Super Sampling) is employed, making use of NVIDIA's new Tensor Cores to produce a high-quality output while using a lower sample count than traditional antialiasing methods. Typically, this technology is used for image processing, where it distinguishes on-screen objects simply by looking at raw pixels. That ability also permits Deep Learning to be utilized for games. NVIDIA references images containing an incredible 64x super-samples and with its neural network attempts to match them to the target resolution output frame. Because this is all done on the Tensor Cores using AI, it avoids blurring and other visual artefacts. NVIDIA is keen to get everyone on board, as developers need only send their game code to process on a DGX supercomputer that generates the data needed by the NVIDIA driver. With a growing list of supported titles, Turing’s DLSS is ripe for experimentation.
There are currently two ways for us to test DLSS, and one of them is not fit to use as a comparative. Although Final Fantasy's implementation works fine, its Temporal AA alternative doesn't and produces unwieldy stuttering, making it an unfair comparison. That leaves us with EPIC's Infiltrator Demo, which with a recent update allows us to make a fair comparison against TXAA.

With DLSS enabled, the performance numbers speak for themselves. Although there are some frame timing issues due to scene transitioning, DLSS provides a near 33% advantage over TXAA. Moreover, there wasn’t a difference in obtainable clock speed or power consumption between the two. Keep in mind that this is a singular example, so this may change in future once the technology is used outside of a canned environment.
The image quality between these modes is currently a hot topic and fundamentally troublesome to convey. DLSS is far better in motion than it is in image stills, meaning one must see the demo running to make an informed decision. Of course, there are YouTube videos available, but these are heavily compressed, resulting in discernible differences being lost. The other problem is that Infiltrator puts NVIDIA at an advantage. By being repetitive, the demo aids the learning process. This produces a more accurate result than a scene that’s rendered based on player input. Personally, I feel there is a visual trade-off when compared with TXAA, but it's marginal, so the performance advantage cannot be ignored. DLSS is set to drop soon in a growing list of available and up and coming games, so it’ll be interesting to see if performance mirrors the gains seen in this test.
Variable Rate Shading (VRS)Turing now has full support for Microsoft's Variable Rate Shading (VRS), a new rendering method that focuses fidelity on the player's peripheral vision. We have seen similar techniques on previous generations in the form Multi-Resolution Shading, but with VRS, the granularity is far superior as developers can now use a different shading rate for every 16-pixel x 16-pixel region. Forgoing the out of view detail in a given scene means developers can focus GPU power where it matters, something that has far-reaching implications for virtual reality, where maintaining high framerates is crucial.
Turing also furthers support for DX12, with architectural changes that allow integer and floating-point calculations to be performed concurrently (asynchronous compute), resulting in more efficient parallel operation than Pascal. With its Tensor Cores handling antialiasing and post-processing workloads, these new technologies promise to reduce CPU overhead and produce higher frame rates.
RTX – Ray TracingRay tracing isn't a brand-new technology if we talk about it in an all-encompassing way. Many of us have witnessed ray tracing at some point in our lives even though we are unaware of it. In the movie industry, CGI relies upon rays being cast into a scene to portray near photorealism, with rendering often taking months to complete. Transitioning this technique into games is no simple task, but this technological leap must begin somewhere, and that's what NVIDIA has promised to deliver on Turing.
RTX is ray tracing at a fundamental level, using a modest count of one or two samples per pixel to simulate anything from global illumination to detailed shadows. By comparison, CGI in a Hollywood blockbuster would cast thousands at an individual pixel, something that is currently impossible for real-time graphics. To compensate, NVIDIA is utilising the Deep Learning Tensor Cores to fill in the blanks by applying various denoising methods, whilst the RT Cores cast the rays required to bounce and refract light across the scene. Of course, it is not solely the samples per pixel that account for the heavy computational workload.
Recently, I've been speaking with Andy Eder (SharkyUK), who is a professional software developer, with over 30 years of experience in both real-time and non-real-time rendering. Knowing Andy quite well, I was eager to hear his take on RTX, and ask why ray tracing is so incredibly hardware intensive.
What is Ray Tracing?Ray tracing has been around for many years now with the first reported algorithm originating back in the 1960's. This was a limited technique akin to what we now refer to as ray casting. It was a decade later when John Turner Whitted introduced his method for recursive ray tracing to the computer graphics fraternity. This was a breakthrough that produced results forming the basis of ray tracing as we understand it today. For each pixel, a ray was cast from the camera (viewer) into the scene. If that ray hit an object, more would be calculated depending on the surface properties. If the surface was reflective then a new ray would be generated in the reflected direction and the process would start again with each 'hit' contributing to the final pixel colour. If the surface was refractive, then another ray would be needed as it entered, again considering the material properties and potentially changing its direction. It is the directional change of the ray through the object that gives rise to the distortion effects we see when looking at things submerged through water or glass. Of course, materials like glass have reflective and refractive properties hence additional rays would need to be created to take this into account. There is also a need to determine whether the current position is being lit or in shadow. To determine the point of origin, these shadow rays are cast from that current hit point (the point of intersection) towards each light source in the scene. If it hits another entity, then the point of intersection is in shadow. This process is often referred to as direct lighting. It is this recurrent nature that affords greater realism albeit at the expense of significant computational cost.
What does this mean for games?Ray tracing provides the basis for realistic simulation of light transport through a 3D scene. With traditional real-time rendering methods, it is difficult to generate realistic shadows and reflections without resorting to various techniques that approximate, at best. The beauty of ray tracing is that it mimics natural phenomena. The algorithm also allows pixel colour/ray computation to have a high degree of independence - thus lending itself well to running in parallel on suitable architectures, i.e. modern-day GPUs.
The basic Whitted ray tracing algorithm isn't without limitation, though. The computation requirements are an order of magnitude greater than found in rasterization. It also fails to generate a truly photorealistic image because it only represents a partial implementation of the so-called rendering equation. Photorealism only occurs when we can fully (or closely) approximate this equation - which considers all physical effects of light transport.
To generate more photorealistic results, additional ray tracing techniques were developed, such as path tracing. This is a technique that NVIDIA has already demonstrated on their RTX cards, and it is popular in offline rendering (for example, the likes of Pixar). It's one thing to use path tracing for generating movie-quality visuals, but to use it in real-time is a remarkable undertaking. Until those levels of performance arrive we must rely on assistance from elsewhere and employ novel techniques that can massively reduce the costs of generating acceptable visuals. This is where AI, deep-learning, and Tensor Cores can help.
Generating quality visuals with ray tracing requires a monumental number of rays to be traced through the scene for each pixel being rendered. Shooting a single ray per pixel (a single sample) is not enough and results in a noisy image. To improve the quality of the generated image, ray tracing algorithms typically shoot multiple rays through the same pixel (multiple samples per pixel) and then use a weighted average of those samples to determine the final pixel colour. The more samples taken per pixel, the more accurate the final image. This increase in visual fidelity comes at processing costs and, consequently cause a significant performance hit.
These images below are from my own real time path tracer, giving an indication about how sample rates affect visual quality and how expensive it is to generate higher-quality results. The algorithm is running in parallel across the full array of CUDA cores on a previous generation NVIDIA GTX 1080 Ti GPU and rendering at a HD resolution of 1920x1080. Please be aware that the generated metrics are for indicative purposes only.
Image 1
Samples per pixel: 1
Render time: <0.1s

In this first image, the scene is rendered using only a single sample per pixel. The result is an incredibly noisy image with lots of missing colour information, and ugly artefacts. As gamers, we would not be happy if games looked like this. However, it was fairly quick to render thanks to the low sample rate.
Image 2
Samples per pixel: 10
Render time: ~1.9s

This time the scene is rendered using 10 samples per pixel. The result is an improvement over the first image, but still shows a lot of noise. We are still some way short of being able to say that the image quality is sufficient.
Image 3
Samples per pixel: 100
Render time: ~20.4s

The scene is now rendered at 100 samples per pixel. The image quality is much better. However, it's still not quite as good as we'd perhaps want. The improved quality comes with a cost, too - taking over 20 seconds to reach this level of quality (for a single frame).
Image 4
Samples per pixel: 1000
Render time: ~206.5s

The image quality shown here is improved compared to the examples generated using less samples. It could be argued that the quality is now good enough to be used in a game. Reflections, refractions, depth-of field, global illumination, ambient occlusion, caustics, and other effects can be observed, and all appear to be at a reasonably good quality level. The downside is that this single frame took over 206 seconds to render, or 0.005fps.
Gaming GPUs must be able to generate ray traced imagery at interactive frame rates, which is no small feat. At 60fps, the GPU must render the image (including ray tracing) within approximately 16 milliseconds. This is far from trivial given the current level of technology. Pixar, who employ fundamentally similar ray tracing algorithms to generate movie-quality CGI for their blockbuster movies, have render farms dedicated to the task. Even with a render farm consisting of 24,000 CPU cores, the movie Monsters Inc. took two years to render (with an average frame render time of over 20 hours). That's the reality of ray tracing and how astronomically intensive it can be to generate beautiful images.
Recursion DepthIn addition to the number of samples per pixel, it is also important to have a sufficient number of bounces (recursion depth) for the ray being traced through the scene. Each time the ray hits an object it bounces off in another direction (depending on the material properties of the object hit). Eventually, it will terminate when it crosses a given threshold or hits no further objects, at which point it escapes the environment. The bounced rays provide the colour information needed to realize reflections, refractions, and global illumination amongst other visual elements. The more bounces a ray is permitted to make, the more realistic the pixel colour that is generated. A good example is a ray is being traced through a room full of mirrors which may be bounced around the scene many times. Whilst this will result in great looking reflections it also incurs a large performance hit. To alleviate this issue a ray tracer will typically set a limit on the number of bounces a ray can make - hence providing a means to trade visual quality against computation costs. However, too few bounces and rendering issues may be seen; for example, incomplete reflections, poor global illumination results, and artefacts across object surfaces, where insufficient colour information has been generated to accurately realise the surface material. On the other hand, , too many bounces can result in extreme performance costs with minimal increases in visual quality - after a given number of bounces the difference that additional bounces generate may not even be noticeable.
This incredible technology breakthrough has barely hit home yet, but it's the beginning of an exciting revolution in both the graphics and gaming industry. Battlefield 5, Shadow of the Tomb Raider, and Metro Exodus are set to use RTX technologies for various rendering purposes, so being able to test this for ourselves is only a short while away.
13900KS / 8000 CAS36 / ROG APEX Z790 / ROG TUF RTX 4090