Final Words

First keep in mind that these performance numbers are early, and they were run on a partly crippled, very early platform. With that preface, the fact that Nehalem is still able to post these 20 - 50% performance gains says only one thing about Intel's tick-tock cadence: they did it.

We've been told to expect a 20 - 30% overall advantage over Penryn and it looks like Intel is on track to delivering just that in Q4. At 2.66GHz, Nehalem is already faster than the fastest 3.2GHz Penryns on the market today. At 3.2GHz, I'd feel comfortable calling it baby Skulltrail in all but the most heavily threaded benchmarks. This thing is fast and this is on a very early platform, keep in mind that Nehalem doesn't launch until Q4 of this year.

One valid concern is with regards to performance in applications that don't scale well beyond two or four cores, what will Nehalem offer us then?  Our DivX test doesn't scale well beyond four cores and even then Nehalem's performance was in the 20 - 30% faster range that we've been expecting.  The other thing to keep in mind is that none of these tests are really stressing Nehalem's integrated memory controller.  When AMD made the move to an IMC, we saw an instant 20% performance boost in most applications.  I suspect that the applications that don't benefit from Hyper Threading, will at least benefit from the IMC.  We've only scratched the surface of Nehalem here, looking at the benefits of Hyper Threading and its lower latency unaligned cache accesses.  We've hinted at what's to come with the extremely well balanced and low latency memory hierarchy of Intel's new baby.  Once this thing gets closer to launch, we should be able to fill in the rest of the puzzle.

Over six years ago I had dinner with Intel's Pat Gelsinger (back when he was Intel's CTO), and I asked him the same question I always do: "what are you excited about?" Back then his response was "threading", Intel was about to launch Hyper Threading and Pat was convinced that it was absolutely necessary for the future of microprocessors.

It was at the same dinner that Pat mentioned Intel may do a chip with an integrated memory controller much like AMD, but that an IMC wouldn't solve the problem of idle execution units - only indirectly mitigate it. With Nehalem, Intel managed to combine both - and it only took 6 years to pull it off.

Pat also brought up another very good point at that dinner. He turned to me and said that you can only integrate a memory controller once, what do you do next to improve performance? Intel has managed to keep increasing performance, but what I really want to see is what happens at the next tock. Intel proved its ability with Conroe and with Nehalem it shows that the tick-tock model can work, but more than anything looking at Nehalem today makes me excited at what Sandy Bridge will bring.

The fact that we're able to see these sorts of performance improvements despite being faced with a dormant AMD says a lot. In many ways Intel is doing more to improve performance today than when AMD was on top during the Pentium 4 days.

AMD never really caught up to the performance of Conroe, through some aggressive pricing we got competition in the low end but it could never touch the upper echelon of Core 2 performance. With Penryn, Intel widened the gap. And now with Nehalem it's going to be even tougher to envision a competitive high-end AMD CPU at the end of this year. 2009 should hold a new architecture for AMD, which is the only thing that could possibly come close to achieving competition here. It's months before Nehalem's launch and there's already no equal in sight, it will take far more than Phenom to make this thing sweat.

Power Consumption
Comments Locked

108 Comments

View All Comments

  • SiliconDoc - Monday, July 28, 2008 - link

    lol- Buddy you are thinking.
  • magreen - Thursday, June 5, 2008 - link

    Thanks for the amazing preview, Anand!

    I hope you and Gary will get us more Nehalem information quick like bunnies.
  • yottabit - Thursday, June 5, 2008 - link

    Great Article Anand! I'm so excited for this new technology. But that socket and triple channel memory archetecture makes me want to puke in my mouth a little bit. It's very reminiscent to me of the Socket 423/RDRAM era. I have the feeling that they are going to release this setup for a lot of the early adopters and then screw them over by dropping the socket completely, when they decide that Dual Channel DDR3 is fast enough. I can't picture two platforms running side by side, with two entirely different sockets. People whant a Nehalem but need 4 gigs of ram will end up buying 6 Gigs of ram... and DDR3 ain't exactly cheap.

    I wish they had plans to through this into the mainstream faster. I'd love to have one of these, in dual channel variety. I'm still running an old early A64, and I'm holding out for these next gen processors in the next year or two.

    Its awesome to see that nice performance per clock increase, but the triple channel memory is a real slap in the face to me. Its like Intel saying "look, we increase clock for clock performance, but we also decided to use some brute force and raise our power consumption and motherboard complexity for no reason by adding another impractical memory channel". I don't see it as elegant at all. I think they are overcompensating for their lack of memory bandwith in recent times. :-

    Maybe AMD will have a chance to jump in with some nicer Phenom's before Nehalem comes out and actually capture some quad core market?
  • npp - Thursday, June 5, 2008 - link

    I'm tired of all those people who just can live with the fact that the world is spinning and the CPUs that were reviewed here are simply far faster than the Penryn or Phenom you just bought yesterday... Get used to the fact, this is how thing happen today. Nehalem will be probably the most advanced x86 (x64) CPU when launched, and it just happened that Intel developed it - it could have been anybody else, say AMD, or nVidia, or whoever you prefer, no difference to me. Things go ahead, and some vendors simply get the job done first, in the grand scheme of things, it is all the same. All those fanboys I see around sound like some 3 year old children fighting for candy to me, It's amusing to see how AMD or Intel PR locked you up, guys.

    Now a brief question, aimed directly at Anand, I guess: I still can't figure out why memory performance is so low even via an advanced controller such as Nehalem's. As far as I can tell, 3-channel DDR3-1066 should be able to deliver up to 25,5 GB/s of bandwidth, far from the figures we see. How does this happen? And once more: you measured some 46ms latency altogether, how was that obtained? Assuming memory clock of 133Mhz, this should yield something like CAS4 (~30ms) latencies for the memory, am I right?
  • fitten - Thursday, June 5, 2008 - link

    30us

    As far as single/dual/triple channel, it seems that Anand and gang were able to test with all three modes (you'll notice the comment about WinRAR being 10% faster with triple channel compared to single channel on the pre-release motherboard)... so you don't *have* to buy 3 sticks of memory... if you want 4GiB, you should be able to get 1x4 or 2x2 and leave the other slot(s) empty.
  • npp - Thursday, June 5, 2008 - link

    It's all nanoseconds, of course, not milli- or micro, my fault. Never mind, I'm still awaiting some reasonable explanation about the "modest" bandwidth measured. 12GB/s copy is by no means little - I can't say if it's achievable via overclocking today, I'm not into that kind of business - but still I would guess no. Still, it seems little compared to the max. theoretical values.
  • Anand Lal Shimpi - Thursday, June 5, 2008 - link

    I think we may have to wait for a final Nehalem platform before we can make any calls on memory bandwidth figures, but do keep in mind that the amount of usable memory bandwidth will depend largely on how it's being measured. If the algorithm is even slightly compute bound we won't see perfect scaling with theoretical memory bandwidth.

    I'm not sure how Everest measures bandwidth so I can't tell you exactly what numbers we should be seeing there, but it is useful for comparing a relative increase in bandwidth between Penryn and Nehalem.

    Take care,
    Anand
  • npp - Thursday, June 5, 2008 - link

    Thank you very much, very kind of you to bother answering my question! Keep up the good work here at Anandtech.
  • NINaudio - Thursday, June 5, 2008 - link

    I'm not sure hwy everyone is so concerned about DDR3 prices being high. A quick check shows that you can get a 2gig kit of ddr3-1600 for under $150 already. By the time Nehalem is out for mass consumption ddr3 will be even cheaper. I would say that it's pretty realistic to expect to be able to get a 3gig triple channel kit for under $100 and a 6gig triple channel kit for around $175 by the time nehalem is available to us.
  • Anand Lal Shimpi - Thursday, June 5, 2008 - link

    What I'm really interested in is why Intel felt that Nehalem needed a three channel DDR3 memory controller. Will it really be necessary for higher clocked Nehalem (or is it Nehalems)? It'd be great for the versions of Nehalem with integrated graphics but I figured those would mostly be pushed into the mainstream, dual channel SKUs anyways. Looks like we'll have to wait at least a few more months before we can find out for sure.

    -A

Log in

Don't have an account? Sign up now