Thread It Like Its Hot

Hyper Threading was a great technology, simply first introduced on the wrong processor. The execution units of any modern day microprocessor are power hungry and consume a lot of die space, the last thing you want is to have them be idle with nothing to do. So you implement various tricks to keep them fed and working as often as possible. You increase cache sizes to make sure they never have to wait on main memory, you integrate a memory controller to ensure that trips to main memory are as speedy as possible, you prefetch data that you think you'll need in the future, you predict branches, etc...

Enabling simultaneous multi-threaded (SMT) execution is one of the most power efficient uses of a microprocessor's transistor budget, as it requires a very minimal increase in die size but can easily double the utilization of a CPU's execution units. SMT, or as Intel calls it, Hyper Threading does this by simply dispatching two threads of instructions to an individual processor core at the same time without increasing the available execution resources. Parallelism is paramount to extracting peak performance out of any out of order core, double the number of instructions being looked at to extract parallelism from and you increase your likelihood of getting work done without waiting on other instructions to retire or data to come back from memory.

In the Pentium 4 days enabling Hyper Threading required less than a 5% increase in die size but resulted in anywhere from a 0 - 35% increase in performance. On the desktop we rarely saw a big boost in performance except in multitasking scenarios, but these days multithreaded software is far more common than it was six years ago when Hyper Threading first made its debut.


This table shows what needed to be added, partitioned, shared or unchanged to enable Hyper Threading on Intel's Core microarchitecture

When the Pentium 4 made its debut however all we really had to worry about was die size, power consumption had yet to become a big issue (which the P4 promptly changed). These days power efficiency, die size and performance all go hand in hand and thus the benefits of Hyper Threading must also be looked at from the power perspective.

I took a small sampling of benchmarks ranging from things like POV-Ray which scales very well with more threads to iTunes, an application that couldn't care less if you had more than two cores. What we're looking at here are the performance and power impact due to Hyper Threading:

Intel Core i7-965 (Nehalem 3.2GHz) POV-Ray 3.7 Beta 29 Cinebench R10 1CPU Race Driver GRID
HT Disabled 3239 PPS 207W 4671 CBMarks 161.8W 103 fps 300.7W
HT Enabled 4202 PPS 233.7W 4452 CBMarks 159.5W 102.9 fps 302W

 

Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption. The single threaded Cinebench test shows a slight decrease in both performance and power consumption (negligible) and the same can be said for Race Driver GRID.

When Hyper Threading improves performance, it does so at a reasonable increase in power consumption. When performance isn't impacted, neither is power consumption. This time around Hyper Threading has no drawbacks, while before the only way to get it was with a processor that was too hot and barely competitive, today Intel offers it on an architecture that we actually like. Hyper Threading is actually the first indication of Nehalem's true strength, not performance, but rather power efficiency...

Intel's Warning on Memory Voltage Is Nehalem Efficient?
Comments Locked

73 Comments

View All Comments

  • anand4happy - Sunday, February 8, 2009 - link

    saw many thing but this is the thing something dfferent

    sd4us.blogspot.com/2009/01/intel-viivintel-975x-express-955x.html
  • nidhoggr - Monday, November 10, 2008 - link

    I cant find that information on the test setup page.
  • nidhoggr - Monday, November 10, 2008 - link

    test not text :)
  • puffpio - Wednesday, November 5, 2008 - link

    would you guys consider rebenchmarking?
    from the x264 changelog since the nehalem specific optimizations:
    "Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%."
  • anartik - Wednesday, November 5, 2008 - link

    Good review and better than Tom's overall. However Tom's stumbled on something that changed my mind about gaming with Nehalem. While Anand's testing shows minimal performance gains (and came to the not good for games conclusion) Tom's approached it with 1-4 GPU's SLI or Crossfire. All I can say is the performance gains with Nvidia cards in SLI was stunning. Maybe the platform favors SLI or Nvidia had a driver advantage in licensing SLI to Intel. Either way Nehalem and SLI smoked ATI and the current 3.2 extreme quad across the board.
  • dani31 - Wednesday, November 5, 2008 - link

    I know it would't change any conclusion, but since we discuss bleeding edge Intel hardware it would have been nice to see the same in the AMD testbed.

    Using a SB600 mobo (instead of the acclaimed SB750) and an old set of drivers makes it look like the AMD numbers were simply pasted from an old article.
  • Casper42 - Tuesday, November 4, 2008 - link

    Something I think you guys missed in your article/conslusion is the fact that we're now able to pair a great CPU with a pretty damn good North/South Bridge AND SLI.

    I found that the 680/780/790 featureset is plainly lacking and that the Intel ICH9R/10R seems to always perform better and has more features. If any doubt, look at Matrix RAID vs nVidia's RAID. Night and day difference, especially with RAID5.

    The problem with the X38/X48 was you got a great board but were effectively locked into ATI for high end Gaming.

    Now we have the best of both worlds. You get ICH10R, a very well performing CPU (even the 920 beats most of the Intel Quad Core lineup) AND you can run 1/2/3 nVidia GPUs on the machine. In my opinion, this is a winning combination.


    The only downside I see is board designs seem to suck more and more.

    With socket 1366 being so massive and 6 DIMM slots on the Enthusiast/Gamer boards, we're seeing not only 6 expansion slots (down from the standard of 7) but in most boards I have seen pics of, the top slot is an x1 so they can wedge it next to the x58 IOH which means your left with only 5 slots for other cards. Using 3 dual slot cards is out of the question without a massive 10 slot case (of which there are only like 3-5 on the market) and even if you can wedge 2 or 3 dual slot cards into the machine, you have almost zero expansion card slots should you ever need them.

    Then we get to all the cooling crap surrounding the CPU. ALL these designs rely on a top down traditional cooler and if you decide to use a highly effective tower cooling solution, all the little heatsink fins on the Northbridge and pwer regulators around the CPU get very little or no airflow. Now your in there adding puny little 40/60mm fans that produce more noise than airflow, not to mention that the DIMMs are hardly ever cooled in today's board designs.
    Call me a cooling purist if you will, but I much prefer traditional front to back airflow and all this side intake top exhaust stuff just makes me cringe. I personally run a Tyan Thunder K8WE with 2 Hyper6+ coolers and the procs and RAM are all cooled front to back. Intake and exhaust are 120mm and I have a bit of an air channel in which that airflow never goes near the expansion card slots below, which by the way have a 92mm fan up front pushing air in across the drives and another 92mm fan clipped onto the expansion slots in the back pulling it back out.

    I dont know how to resolve these issues, but I think someone surely needs to because IMHO its getting out of control.
  • lemonadesoda - Tuesday, November 4, 2008 - link

    "Looking at POV-Ray we see a 30% increase in performance for a 12% increase in total system power consumption, that more than exceeds Intel's 2:1 rule for performance improvement vs. increase in power consumption."

    You cant use "total system power", but must make the best estimate of CPU power draw. Why? Because imagine if you had a system with 6 sticks of RAM, 4 HDDs, etc. you would have ever increasing power figures that would make the ratio of increased power consumption (a/b) smaller and smaller!

    If you take your figures and subtract (a guestimate of) 100W for non CPU power draw, then you DONT get the Intel 2:1 ratio at all!

    The figures need revisiting.
  • AnnonymousCoward - Thursday, November 6, 2008 - link

    Performance vs power appears to linearly increase with HT. Using the 100W figure for non-CPU draw means a 25% power increase, which is close to the 30% performance.

    Unless we're talking about servers, I think looking at power draw per application is silly. Just do idle power, load power, and maybe some kind of flops/watt benchmark just for fun.
  • silversound - Tuesday, November 4, 2008 - link

    great article, tomsharware reviews always pro intel and nvidia, not sure if they got pay $ to suppot them. anandtech is always neutral, thx!

Log in

Don't have an account? Sign up now