The Dark Knight: Intel's Core i7
by Anand Lal Shimpi & Gary Key on November 3, 2008 12:00 AM EST- Posted in
- CPUs
Is Nehalem Efficient?
At this year's IDF in San Francisco, Intel revealed a little discussed but extremely important aspect of Nehalem's circuit design:
The Nehalem design is Intel's first microprocessor in the past two decades to feature absolutely no domino logic, it's a fully static CMOS design. I've explained the differences between dynamic domino and static CMOS design in the past, but simply put: domino logic is used as a clock speed play. It's incredibly useful in implementing very high speed circuit paths on a chip and hit its all time peak in Intel's usage in the Pentium 4 days. The downside to using such high speed logic is that it requires a lot of power, but in microprocessor design there are always tradeoffs to be made.
There are many other energy efficiency plays within Nehalem
In Nehalem, Intel took the new architecture as an opportunity to revamp its design, went in and removed all remaining domino logic - but without impacting the peak clock speed of the architecture. The tradeoff here is one of die size, by using more parallel logic Intel was able to convert some serial, high speed paths, into larger, slower circuits that removed the need for domino logic. Details are unfortunately light and a bit beyond the scope of this review, but the move to an all static CMOS design is bound to reduce power consumption. Do you smell a comparison coming?
Both Nehalem and Penryn are built on the same 45nm process, available at the same clock speeds and capable of running the very same applications. In theory, Nehalem should be more power efficient, at the same clock speed, across the board thanks to its static CMOS design. To find out I measured average power consumption over the duration of a handful of benchmarks I used in this review.
Performance | POV-Ray 3.7 | Cinebench XCPU | x264 HD | Crysis |
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) | 2238 PPS | 11502 CBMarks | 61.5 fps | 34.0 fps |
Intel Core i7-920 (Nehalem - 2.66GHz) | 3528 PPS | 16211 CBMarks | 74.8 fps | 33.2 fps |
Nehalem Performance Advantage | 57.6% | 40.9% | 21.6% | -2% |
I picked these four benchmarks because they show us the range of Nehalem's performance, going from no performance improvement all the way up to a gain of nearly 60%. Now let's look at the power consumption in each of these four benchmarks:
Power Consumption | POV-Ray 3.7 | Cinebench XCPU | x264 HD | Crysis |
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) | 168.1W | 175.2W | 167.5W | 220.8W |
Intel Core i7-920 (Nehalem - 2.66GHz) | 202.2W | 208.6W | 176.6W | 230.8W |
Nehalem Power Disadvantage | +34.1W | +33.4W | +9.1W | +10W |
If you actually go through and do the math you'll find that Nehalem, despite using more power, is more efficient than Penryn. Performance per watt is around 24% better in POV-Ray, 15.5% better in Cinebench and 13% better in the x264 HD test. Crysis, the only benchmark where Nehalem actually falls behind, does require more power and thus Nehalem loses the efficiency battle there.
It seems as if Nehalem is even more polarizing than I had though. Despite the move to a fully static CMOS design, the changes aren't enough to make up for the scenario where Nehalem can't offer more performance; power consumption still goes up, albeit not terribly.
It's also worth noting that the power comparison really depends on the CPU used, here we've got the same comparison but with the Core i7-965 vs. the Core 2 Extreme QX9770, both clocked at 3.2GHz:
Performance | POV-Ray 3.7 | Cinebench R10 - XCPU | x264 HD | Crysis |
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) | 2641 PPS | 14065 CBMarks | 73.2 fps | 41.7 fps |
Intel Core i7-965 (Nehalem - 3.2GHz) | 4202 PPS | 18810 CBMarks | 85.8 fps | 40.5 fps |
Power Consumption | POV-Ray 3.7 | Cinebench R10 - XCPU | x264 HD | Crysis |
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) | 230.7W | 227.6W | 230.3W | 293.6W |
Intel Core i7-965 (Nehalem - 3.2GHz) | 233.7W | 230.7W | 196.2W | 248.5W |
It's tough to draw any conclusions based on two CPUs, but it is possible that at higher clock speeds Nehalem's efficiency advantage kicks in. The QX9770 has always been a bit high on the power consumption side, whereas the i7-965, even in situations where it is slower than the QX9770, offers better power efficiency here.
73 Comments
View All Comments
Clauzii - Thursday, November 6, 2008 - link
I still use PS/2. None of the USB keyboards I've borrowed or tried out would work in 'boot'. Also I think a PS/2 keyboard/mouse don't lag so much, maybe because it has it's own non-shared interrupt line.But I can see a problem with PS/2 in the future, with keyboards like the Art Lebedev ones. When that technology gets more pocket friendly I'd gladly like to see upgraded but still dedicated keyboard/mouse connectors.
The0ne - Monday, November 3, 2008 - link
Yes. I have the PS2 keyboard on-hand in case my USB keyboard can't get in :)Strid - Monday, November 3, 2008 - link
Ahh, makes sense. Thanks for clarifying!Genx87 - Monday, November 3, 2008 - link
After living through the hell that were ATI drivers back in 2003-2004 on a 9600 Pro AIW. I didnt learn and I plopped money down on a 4850 and have had terrible driver quality since. More BSOD from the ati driver than I have had in windows in the past 5 years combined from anything. Back to Nvidia for me when I get a chance.That said this review is pretty much what I expected after reading the preview article in August. They are really trying to recapture market in the 4 socket space. A place where AMD has been able to do well. This chip is designed for server work. Ill pick one up after my E8400 runs out of steam.
Griswold - Tuesday, November 4, 2008 - link
You're just not clever enough to setup your system properly. I have two indentical systems sitting here side by side with the only difference being the video card (HD3870 in one and a 8800GT in the other) and the box with the nvidia cards gives me order of magnitude more headaches due to crashing driver. While that also happens on the 3870 machine now and then, its nowehere nearly as often. But the best part: none of the produces a BSOD. That is why I know you're most likely the culprit (the alternative is faulty hardware or a pathetic overclock).Lord 666 - Monday, November 3, 2008 - link
The stock speed of a Q9550 is 2.83ghz, not 2.66qhz.Why the handicap?
Anand Lal Shimpi - Monday, November 3, 2008 - link
My mistake, it was a Q9450 that was used. The Q9550 label was from an earlier version of the spreadsheet that got canned due to time constraints. I wanted a clock-for-clock comparison with the i7-920 which runs at 2.66GHz.Take care,
Anand
faxon - Monday, November 3, 2008 - link
toms hardware published an article detailing that there would be a cap on how high you are allowed to clock your part before it would downclock it back to stock. since this is an integrated par of the core, you can only turn it off/up/down if they unlock it. the limit was supposedly a 130watt thermal dissipation mark. what effect did this have in your tests on overclocking the 920?Gary Key - Monday, November 3, 2008 - link
We have not had any problems clocking our 920 to the 3.6GHz~3.8GHz level with proper cooling. The 920, 940, and 965 will all clock down as core temps increase above the 80C level. We noticed half step decreases above 80C or so and watched our core multipliers throttle down to as low as 5.5 when core temps exceeded 90C and then increase back to normal as temperatures were lowered.This occurred with stock voltages or with the VCore set to 1.5V, it was dependent on thermals, not voltages or clock speeds in our tests. That said, I am still running a battery of tests on the 920 right now, but I have not seen an artificial cap yet. That does not mean it might not exist, just that we have not triggered it yet.
I will try the 920 on the Intel board that Toms used this morning to see if it operates any differently than the ASUS and MSI boards.
Th3Eagle - Monday, November 3, 2008 - link
I wonder how close you came to those temperatures while overclocking these processors.The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?