What kind of card is it?
NVIDIA announced its next-generation graphics-card architecture at the GPU Technology Conference in San Jose last night.
A graphics card?
Codenamed Fermi, presumably after the renowned Italian physicist, the new GPU underscores NVIDIA shift towards more compute-centric design that, on paper, is equally at home in the high-computing space as with rendering pretty-looking pixels for games.
Here's a very high-level overview of the 'GT300' chip that will go on sale in a few months' time. On a fundamental level, Fermi packs in 512 CUDA processing cores arranged in 16 banks of streaming multiprocessors (SPs) - the green rectangles - that each hold 32 execution cores. As a comparison, GeForce GTX 285's SP count is 240.
Pure processing grunt is allied to a 384-bit memory interface - six 64-bit-wide channels - that connects to GDDR5 memory. The chip is fabricated on TSMC's 40nm process and packs in around 3bn transistors.
NVIDIA hasn't divulged clock-speeds at GTC, but it's reasonably safe to assume that Fermi's memory-bandwidth will be higher than AMD's Radeon HD 5870's, thanks to the wider bus, and compute power somewhere in the vicinity of AMD's best single-GPU card. NVIDIA continues the split-clock speeds for the front-end and shaders.
A compute card?
NVIDIA's recent company-wide strategy has been to actively promote its products in the high-computing space, where margins are far healthier than for desktop and mobile parts. This is where you'd find cards such as the display-less Tesla C1060 - ostensibly a tweaked GeForce GT200 with far better software support - selling for thousands of dollars.
Fermi has been designed with the needs of HPC customers firmly in mind, then. The GPU supports the IEEE 754-2008 floating-point standard, including inherent support for the fused multiply-add (FMA) operation, and is able to process double-precision calculations at half single-precision speed, compared with incumbent GT200's one-eighth 32-bit arithmetic throughput. AMD's Radeon HD 5870 supports the same standard as Fermi, too.
Looking towards HPC again, Fermi adds in provision for ECC memory, a configurable 64KB L1 shared cache and unified 768KB of L2 cache under the banner of Parallel DataCache, and an updated GigaThread scheduler. The present GT200 architecture is such that programs are executed in a sequential fashion, taking up the entire GPU whilst being computed, yet HPC-oriented kernels may not be large enough to 'fill' the broad architecture.
Fermi's scheduling enables up to 16 kernels (programs) concurrently, keeping the GPU at near-maximum efficiency, says NVIDIA, and is complemented by faster context switching between different kinds of kernels - Fermi can only run multiple iterations of one type. Faster switching will also help in the concurrent running of GPGPU and gaming code.
How will it compete?
A kind of CPU?
On top of presently supported C, via CUDA, there will also be support for standards-based OpenCL, DirectCompute, and DX11, as well as C++, the latter made possible by the use of a new, low-level instruction set, known as Parallel Thread eXecution 2.0 (PTX 2.0), that caters for a 40-bit, 1TB unified address space.
NVIDIA says that the GPU will also run the likes of Python and Java, although just how effective coding via a 'wrapper' will be is, perhaps, debatable.
Jen-Hsun Huang, NVIDIA's CEO, has spoken about Fermi in a parlance that's more familiar to the dissemination of CPU architecture. With constant references to cores, caches, and hierachy, one could be forgiven he was describing an Intel Larrabee-esque design, and we'd be very interested to see how Intel's effort shapes up when formally announced.
Better than Radeon HD 5870?
From what we know thus far, it's difficult to determine whether Fermi will be a better 'card' than AMD's RV870 which is in shops today. Given reasonable clock-speed analogous to the GTX 2xx series, Fermi should have a memory-bandwidth advantage and, perhaps, win out in the double-precision stakes, but we'd be surprised if it trounced AMD's best in a gaming environment.
NVIDIA's making much of the compute design, driven on by the massive investment in CUDA, and Fermi can be thought of as much a HPC tool as a gaming card. AMD's design, however, isn't just a basic pixel-pusher. It, too, supports the IEEE 754-2008 standards, runs double-precision at a mighty fast rate and can, we believe, handle multiple kernels.
The big difference is that Radeon HD 5870 is out today, etailing for £300. The higher-end Fermi cards will cost at least as much, we imagine, because fitting 3bn transistors - and yes, NVIDIA and AMD count them differently - must mean a die-size that's appreciably larger than HD 5870's 334mm². Yields are inextricably linked to die-sizes, so producing a 50 per cent larger die (than HD 5870) will, ceteris paribus, lead to a greater number of per-wafer flaws. Going big(ger) is bad for business.
AMD's hit the ground running with its high-end GPUs and will see healthy sales before Fermi ever gets packaged inside a retail box. Time to market is telling, especially in the run-up to the festive season, and NVIDIA's procrastination may well cause financial pain - in the short term at least. We expect the card to hit the shelves no earlier than February 2010, giving AMD a clear four-month-plus run with DX11 parts in the channel.
Ultimately, NVIDIA has detailed Fermi as a forward-looking, programmable architecture that's aimed at enlarging the company's footprint in the lucrative HPC space - somewhat pre-empting Intel's Larrabee - whilst keeping desktop and mobile 'gaming' customers happy. We'll find out more when NVIDIA divulges the intricacies of the design in coming weeks and months.
Source: Hexus.net