This site may earn chapter commissions from the links on this page. Terms of use.

On November 8, 2006, Nvidia officially launched its first unified shader compages and first DirectX 10-uniform GPU, the G80. The new flake debuted in two new cards, the $599 GeForce 8800 GTX and the $449 GeForce 8800 GTS. Today, the 8800 GTX'south specs seem modest, even low-terminate, with 128 shader cores, 32 texture mapping units, and 24 Render Outputs (ROPs), backed past 768MB of RAM. But back in 2006, the G80 was a titan. Information technology swept both Nvidia'south previous GTX 7xx generation and ATI's Radeon X19xx series completely off the table, even in games where Squad Ruby-red had previously enjoyed a significant functioning advantage.

But the G80 didn't only rewrite performance headlines — information technology redefined what GPUs were, and what they were capable of.

For this retrospective, we spoke with two Nvidia engineers who did a neat deal of work on G80: Jonah Alben, Senior VP of GPU Engineering, and John Danskin, VP of GPU Compages. Before we swoop in, nevertheless, we desire to give a chip of context on what made G80 so different from what came before. Outset with the GeForce iii and Radeon 8500 in 2001, both ATI and Nvidia cards could execute small programs via specialized, programmable vertex and pixel shaders. Nvidia'southward last desktop compages to use this approach was the G71, released on March nine, 2006. It looked like this:

7900_blo7900_blockdiagramckdiagram

G71 block diagram.

In this diagram, the vertex shaders are the eight dedicated blocks at the top, above the "Choose / Clip / Setup" department. The 24 pixel shaders are the large grouping of half dozen blocks in the eye of the diagram, where each block corresponds to 4 pixel pipelines (24 pixel shaders, total). If you aren't familiar with how pre-unified shader GPUs were built, this diagram probably looks a bit odd. G80, in contrast, is rather more familiar:

block-g80

Nvidia's GTX 8800 family were the kickoff consumer graphics cards to swap dedicated pixel and vertex shaders for a broad array of simpler stream processors (SPs, later referred to every bit CUDA cores). While previous GPUs were vector processors that could operate concurrently on the red, green, blue, and blastoff color components of a single pixel, Nvidia designed the G80 as a scalar processor, in which each streaming processor handled one color component. At a high level, Nvidia had switched from a GPU architecture with dedicated hardware for specific types of shader programs to an array of relatively simple cores that could be programmed to perform whatever types of shader calculations the application required at that particular moment.

The simpler cores could likewise be clocked much faster. The GeForce 7900 GTX was built on a 90nm process and striking 650MHz, while the GeForce 8800 GTX was built on an 80nm "half node" procedure and ran its shader cores at ane.35GHz. Merely as with any brand-new compages, there were significant risks involved.

Our interview has been lightly edited for clarity.

ET: G80 debuted more than-or-less simultaneously with DirectX 10 and was the first fully programmable GPU to debut for PCs. Information technology was besides much larger than previous Nvidia fries (the GTX 7900 GTX had 278 million transistors, G80 was a 681 million transistor pattern.) What were some of the challenges associated with making this jump, either in terms of managing the blueprint or choosing which features to include and support?

Jonah Alben: I think that 1 of the biggest challenges with G80 was the creation of the brand new "SM" processor design at the core of the GPU. Nosotros pretty much threw out the unabridged shader compages from NV30/NV40 and made a new i from scratch with a new general processor architecture (SIMT), that besides introduced new processor design methodologies.

This HL2 benchmark data from Anandtech's review shows just how huge the G80's performance leap was. HL2 was an ATI-friendly title.

This HL2 benchmark information from Anandtech's review shows just how huge the G80'south functioning bound was. HL2 was an ATI-friendly title.

ET: Were there any features or capabilities of G80 that represented a risk for Nvidia, in terms of dice cost / difficulty, just that you included because y'all felt the take a chance was worth it?

Jonah Alben: Nosotros definitely felt that compute was a adventure. Both in terms of expanse – we were adding area that all of our gaming products would take to conduct even though it wouldn't be used for gaming – and in terms of complexity of taking on compute equally a parallel operating manner for the chip.

John Danskin: This was carefully metered. We gave John Nickolls' compute squad stock-still area and engineering budgets. Within their budgets, they did an incredible job. G80 was designed to run much more than complicated pixel shaders with more branching, dependencies, and resource requirements than previous chips. Our poster-child shader was called oil slick and it was close to 1,000 instructions long. These complex shaders didn't exist in games at the fourth dimension, but we saw that programmable graphics was just get-go.

ET: Did all of these bets pay off? Were at that place capabilities that weren't heavily adopted, or whatsoever features that were more successful than you expected?

Jonah Alben: Geometry shaders [introduced in G80 and DirectX 10] didn't stop up beingness very heavily adopted at the time. Just they were a first step towards other investments in programmable geometry (tessellation in Fermi, multi-projection in Pascal) that have proven to be extremely useful. Compute ended upwards beingness even more than important than I thought information technology would exist at the time – especially, information technology has been exciting to run across it become important to gaming.

John Danskin: In the brusk run, we overshot on the flexibility of our shader cores. For most of the games G80 ran, something simpler might have been more efficient. In the long run, G80 encouraged the development of more realistic, more heady content, which was a win. We had high hopes [for GPU computing] but it was similar to a startup. Almost startups fail. Some startups change the earth. GPU computing started out exciting but minor. Now it's the foundation of deep learning. Few could have foreseen that.

ET: How much of G80'south lineage withal remains in modern Nvidia cards? Have there been whatsoever follow-up designs (Tesla – Pascal) that you lot would say correspond an even larger generational shift than G80 was compared to G71?

Jonah Alben: While nosotros've definitely made major architectural changes since then (Fermi was a major arrangement compages change and Maxwell was another large alter to the processor blueprint), the basic structure that we introduced in G80 is however very much there today.

Pascal-Diagram

Nvidia'due south Pascal packs vastly more cores than G80, and uses different arrangements of features, but you tin can see certain similarities between them.

ET : The G80 debut predates the official launch of CUDA past roughly eight months — how much of the GPUs pattern was driven by what Nvidia wanted to achieve with CUDA? Was this a case of starting with a programmable GPU and realizing you could accomplish much more, or did NV start out from the outset with a program to offer a GPU that could offer both fantabulous game performance and superior compute functioning too?

John Danskin: Nosotros were primarily driven by games, but we saw that gaming and computing performance were complementary. Nosotros made the well-nigh programmable graphics engine we could design, and and so we made sure that it could exercise compute well, too. John Nickolls' vision was that we would address full general High Performance Computing bug.

ET : Looking back with the benefit of hindsight, did the launch of G80 and NV's subsequent efforts take you where you thought they would?

Jonah Alben: Information technology has taken the states well beyond what I expected at the time. In particular CUDA has proven to be a cracking success. The central design of CUDA that John Nickolls, John Danskin and others defined at the very outset is nonetheless there today, both in CUDA and in like programming languages (DirectX Compute, OpenCL, etc.). It turned out that we were correct to believe that the earth needed a new programming model that was designed thoughtfully for parallel programming – both in terms of how to express the workload and in terms of how to constrain the programmer and so that their code would exist structured in a way that was likely to perform well.

The speedups that people got with CUDA on GPUs were downright astonishing.

It was the correct call to put compute back up in all of our GPUs. Nosotros built a huge product base and made GPU computing accessible to anyone with an thought that needed more performance than traditional CPUs could handle. Contempo developments like the explosion of deep learning I call back are straight connected to that conclusion.

Conclusion

True revolution but happens on occasion in the PC manufacture. Nigh products are iterative and evolutionary, rather than wholesale reinventions or radical performance leaps. The debut of G80, however, was arguably 1 of these moments — and 10 years on, we tip our chapeau to the GPU that launched Nvidia's HPC ambitions and kickstarted much of the GPGPU business. Today, more TOP500 supercomputer systems use Fermi or Kepler-derived accelerators than using AMD Radeon or Intel Xeon Phi hardware combined (66 systems vs. 26 Xeon Phi or Radeon-equipped supercomputers). That's a testament to Nvidia's work on CUDA and its overall support of GPGPU computing.

Note: Technically, ATI's Xenos GPU in the Xbox 360 was the first unified shader GPU in consumer hardware, but Xenos wasn't a DirectX x-capable GPU and information technology was never deployed in ATI's PC business. ATI's starting time unified shader architecture for the PC was the R600, which arrived on May 14, 2007.