The Dark Knight: Intel's Core i7
by Anand Lal Shimpi & Gary Key on November 3, 2008 12:00 AM EST- Posted in
- CPUs
Is Nehalem Efficient?
At this year's IDF in San Francisco, Intel revealed a little discussed but extremely important aspect of Nehalem's circuit design:
The Nehalem design is Intel's first microprocessor in the past two decades to feature absolutely no domino logic, it's a fully static CMOS design. I've explained the differences between dynamic domino and static CMOS design in the past, but simply put: domino logic is used as a clock speed play. It's incredibly useful in implementing very high speed circuit paths on a chip and hit its all time peak in Intel's usage in the Pentium 4 days. The downside to using such high speed logic is that it requires a lot of power, but in microprocessor design there are always tradeoffs to be made.
There are many other energy efficiency plays within Nehalem
In Nehalem, Intel took the new architecture as an opportunity to revamp its design, went in and removed all remaining domino logic - but without impacting the peak clock speed of the architecture. The tradeoff here is one of die size, by using more parallel logic Intel was able to convert some serial, high speed paths, into larger, slower circuits that removed the need for domino logic. Details are unfortunately light and a bit beyond the scope of this review, but the move to an all static CMOS design is bound to reduce power consumption. Do you smell a comparison coming?
Both Nehalem and Penryn are built on the same 45nm process, available at the same clock speeds and capable of running the very same applications. In theory, Nehalem should be more power efficient, at the same clock speed, across the board thanks to its static CMOS design. To find out I measured average power consumption over the duration of a handful of benchmarks I used in this review.
Performance | POV-Ray 3.7 | Cinebench XCPU | x264 HD | Crysis |
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) | 2238 PPS | 11502 CBMarks | 61.5 fps | 34.0 fps |
Intel Core i7-920 (Nehalem - 2.66GHz) | 3528 PPS | 16211 CBMarks | 74.8 fps | 33.2 fps |
Nehalem Performance Advantage | 57.6% | 40.9% | 21.6% | -2% |
I picked these four benchmarks because they show us the range of Nehalem's performance, going from no performance improvement all the way up to a gain of nearly 60%. Now let's look at the power consumption in each of these four benchmarks:
Power Consumption | POV-Ray 3.7 | Cinebench XCPU | x264 HD | Crysis |
Intel Core 2 Quad Q9450 (Penryn - 2.66GHz) | 168.1W | 175.2W | 167.5W | 220.8W |
Intel Core i7-920 (Nehalem - 2.66GHz) | 202.2W | 208.6W | 176.6W | 230.8W |
Nehalem Power Disadvantage | +34.1W | +33.4W | +9.1W | +10W |
If you actually go through and do the math you'll find that Nehalem, despite using more power, is more efficient than Penryn. Performance per watt is around 24% better in POV-Ray, 15.5% better in Cinebench and 13% better in the x264 HD test. Crysis, the only benchmark where Nehalem actually falls behind, does require more power and thus Nehalem loses the efficiency battle there.
It seems as if Nehalem is even more polarizing than I had though. Despite the move to a fully static CMOS design, the changes aren't enough to make up for the scenario where Nehalem can't offer more performance; power consumption still goes up, albeit not terribly.
It's also worth noting that the power comparison really depends on the CPU used, here we've got the same comparison but with the Core i7-965 vs. the Core 2 Extreme QX9770, both clocked at 3.2GHz:
Performance | POV-Ray 3.7 | Cinebench R10 - XCPU | x264 HD | Crysis |
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) | 2641 PPS | 14065 CBMarks | 73.2 fps | 41.7 fps |
Intel Core i7-965 (Nehalem - 3.2GHz) | 4202 PPS | 18810 CBMarks | 85.8 fps | 40.5 fps |
Power Consumption | POV-Ray 3.7 | Cinebench R10 - XCPU | x264 HD | Crysis |
Intel Core 2 Extreme QX9770 (Penryn - 3.2GHz) | 230.7W | 227.6W | 230.3W | 293.6W |
Intel Core i7-965 (Nehalem - 3.2GHz) | 233.7W | 230.7W | 196.2W | 248.5W |
It's tough to draw any conclusions based on two CPUs, but it is possible that at higher clock speeds Nehalem's efficiency advantage kicks in. The QX9770 has always been a bit high on the power consumption side, whereas the i7-965, even in situations where it is slower than the QX9770, offers better power efficiency here.
73 Comments
View All Comments
anand4happy - Sunday, February 8, 2009 - link
saw many thing but this is the thing something dfferentsd4us.blogspot.com/2009/01/intel-viivintel-975x-express-955x.html
nidhoggr - Monday, November 10, 2008 - link
I cant find that information on the test setup page.nidhoggr - Monday, November 10, 2008 - link
test not text :)puffpio - Wednesday, November 5, 2008 - link
would you guys consider rebenchmarking?from the x264 changelog since the nehalem specific optimizations:
"Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%."
anartik - Wednesday, November 5, 2008 - link
Good review and better than Tom's overall. However Tom's stumbled on something that changed my mind about gaming with Nehalem. While Anand's testing shows minimal performance gains (and came to the not good for games conclusion) Tom's approached it with 1-4 GPU's SLI or Crossfire. All I can say is the performance gains with Nvidia cards in SLI was stunning. Maybe the platform favors SLI or Nvidia had a driver advantage in licensing SLI to Intel. Either way Nehalem and SLI smoked ATI and the current 3.2 extreme quad across the board.dani31 - Wednesday, November 5, 2008 - link
I know it would't change any conclusion, but since we discuss bleeding edge Intel hardware it would have been nice to see the same in the AMD testbed.Using a SB600 mobo (instead of the acclaimed SB750) and an old set of drivers makes it look like the AMD numbers were simply pasted from an old article.
Casper42 - Tuesday, November 4, 2008 - link
Something I think you guys missed in your article/conslusion is the fact that we're now able to pair a great CPU with a pretty damn good North/South Bridge AND SLI.I found that the 680/780/790 featureset is plainly lacking and that the Intel ICH9R/10R seems to always perform better and has more features. If any doubt, look at Matrix RAID vs nVidia's RAID. Night and day difference, especially with RAID5.
The problem with the X38/X48 was you got a great board but were effectively locked into ATI for high end Gaming.
Now we have the best of both worlds. You get ICH10R, a very well performing CPU (even the 920 beats most of the Intel Quad Core lineup) AND you can run 1/2/3 nVidia GPUs on the machine. In my opinion, this is a winning combination.
The only downside I see is board designs seem to suck more and more.
With socket 1366 being so massive and 6 DIMM slots on the Enthusiast/Gamer boards, we're seeing not only 6 expansion slots (down from the standard of 7) but in most boards I have seen pics of, the top slot is an x1 so they can wedge it next to the x58 IOH which means your left with only 5 slots for other cards. Using 3 dual slot cards is out of the question without a massive 10 slot case (of which there are only like 3-5 on the market) and even if you can wedge 2 or 3 dual slot cards into the machine, you have almost zero expansion card slots should you ever need them.
Then we get to all the cooling crap surrounding the CPU. ALL these designs rely on a top down traditional cooler and if you decide to use a highly effective tower cooling solution, all the little heatsink fins on the Northbridge and pwer regulators around the CPU get very little or no airflow. Now your in there adding puny little 40/60mm fans that produce more noise than airflow, not to mention that the DIMMs are hardly ever cooled in today's board designs.
Call me a cooling purist if you will, but I much prefer traditional front to back airflow and all this side intake top exhaust stuff just makes me cringe. I personally run a Tyan Thunder K8WE with 2 Hyper6+ coolers and the procs and RAM are all cooled front to back. Intake and exhaust are 120mm and I have a bit of an air channel in which that airflow never goes near the expansion card slots below, which by the way have a 92mm fan up front pushing air in across the drives and another 92mm fan clipped onto the expansion slots in the back pulling it back out.
I dont know how to resolve these issues, but I think someone surely needs to because IMHO its getting out of control.
lemonadesoda - Tuesday, November 4, 2008 - link
"Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption."You cant use "total system power", but must make the best estimate of CPU power draw. Why? Because imagine if you had a system with 6 sticks of RAM, 4 HDDs, etc. you would have ever increasing power figures that would make the ratio of increased power consumption (a/b) smaller and smaller!
If you take your figures and subtract (a guestimate of) 100W for non CPU power draw, then you DONT get the Intel 2:1 ratio at all!
The figures need revisiting.
AnnonymousCoward - Thursday, November 6, 2008 - link
Performance vs power appears to linearly increase with HT. Using the 100W figure for non-CPU draw means a 25% power increase, which is close to the 30% performance.Unless we're talking about servers, I think looking at power draw per application is silly. Just do idle power, load power, and maybe some kind of flops/watt benchmark just for fun.
silversound - Tuesday, November 4, 2008 - link
great article, tomsharware reviews always pro intel and nvidia, not sure if they got pay $ to suppot them. anandtech is always neutral, thx!