The Nehalem Preview: Intel Does It Again
by Anand Lal Shimpi on June 5, 2008 12:05 AM EST- Posted in
- CPUs
Final Words
First keep in mind that these performance numbers are early, and they were run on a partly crippled, very early platform. With that preface, the fact that Nehalem is still able to post these 20 - 50% performance gains says only one thing about Intel's tick-tock cadence: they did it.
We've been told to expect a 20 - 30% overall advantage over Penryn and it looks like Intel is on track to delivering just that in Q4. At 2.66GHz, Nehalem is already faster than the fastest 3.2GHz Penryns on the market today. At 3.2GHz, I'd feel comfortable calling it baby Skulltrail in all but the most heavily threaded benchmarks. This thing is fast and this is on a very early platform, keep in mind that Nehalem doesn't launch until Q4 of this year.
One valid concern is with regards to performance in applications that don't scale well beyond two or four cores, what will Nehalem offer us then? Our DivX test doesn't scale well beyond four cores and even then Nehalem's performance was in the 20 - 30% faster range that we've been expecting. The other thing to keep in mind is that none of these tests are really stressing Nehalem's integrated memory controller. When AMD made the move to an IMC, we saw an instant 20% performance boost in most applications. I suspect that the applications that don't benefit from Hyper Threading, will at least benefit from the IMC. We've only scratched the surface of Nehalem here, looking at the benefits of Hyper Threading and its lower latency unaligned cache accesses. We've hinted at what's to come with the extremely well balanced and low latency memory hierarchy of Intel's new baby. Once this thing gets closer to launch, we should be able to fill in the rest of the puzzle.
Over six years ago I had dinner with Intel's Pat Gelsinger (back when he was Intel's CTO), and I asked him the same question I always do: "what are you excited about?" Back then his response was "threading", Intel was about to launch Hyper Threading and Pat was convinced that it was absolutely necessary for the future of microprocessors.
It was at the same dinner that Pat mentioned Intel may do a chip with an integrated memory controller much like AMD, but that an IMC wouldn't solve the problem of idle execution units - only indirectly mitigate it. With Nehalem, Intel managed to combine both - and it only took 6 years to pull it off.
Pat also brought up another very good point at that dinner. He turned to me and said that you can only integrate a memory controller once, what do you do next to improve performance? Intel has managed to keep increasing performance, but what I really want to see is what happens at the next tock. Intel proved its ability with Conroe and with Nehalem it shows that the tick-tock model can work, but more than anything looking at Nehalem today makes me excited at what Sandy Bridge will bring.
The fact that we're able to see these sorts of performance improvements despite being faced with a dormant AMD says a lot. In many ways Intel is doing more to improve performance today than when AMD was on top during the Pentium 4 days.
AMD never really caught up to the performance of Conroe, through some aggressive pricing we got competition in the low end but it could never touch the upper echelon of Core 2 performance. With Penryn, Intel widened the gap. And now with Nehalem it's going to be even tougher to envision a competitive high-end AMD CPU at the end of this year. 2009 should hold a new architecture for AMD, which is the only thing that could possibly come close to achieving competition here. It's months before Nehalem's launch and there's already no equal in sight, it will take far more than Phenom to make this thing sweat.
108 Comments
View All Comments
mkruer - Thursday, June 5, 2008 - link
Not a problem.I tend not to take most things at face value. Looking at the Nehalem, its focus was to increase the multi threaded performance, not the single thread app per say. This would put it more inline with what AMD is offering on per core scalability. The Nehalem will get Intel back into the big iron scalability that it lost to AMD.
My guess is that the Nehalem will not give users any real advantage playing games or other single threaded apps, unless the game or app supports more then one thread.
The final question is poised back to AMD. If AMD gets their single threaded IPC and clock speed up, then both platforms should be near identical from a performance standpoint. Then it is just down to price, manufacturing and distribution. I just hope that AMD claims of 15-20% improvement in per core IPC are true. This should make this holiday season much more interesting.
Anand Lal Shimpi - Thursday, June 5, 2008 - link
Nehalem most definitely had a server focus coming up, but I wouldn't underestimate what the IMC will do for CPU-bound gaming performance. Don't forget what the IMC did for the K8 vs. Athlon XP way back when...As far as AMD goes, clock speed issues should get resolved with the move to 45nm. The IPC stuff should get taken care of with Bulldozer, the question is when can we expect Bulldozer?
JumpingJack - Saturday, June 7, 2008 - link
Don't count on 45 nm clocking up much higher than 65 nm, maybe another bin or so.... gate leakage and SCE are limiting and the reason for the sideways move from 90 to 65 nm to begin with (traditional gate ox, SiO2, did not scale 90 to 65 nm) ... the next chance for a decent clock bump will come with their inclusion of HKMG. Which from the rumor mill isn't until 1H09.fitten - Friday, June 6, 2008 - link
AMD hasn't really resolved any clock speed issues from the move from 130nm -> 90nm -> 65nm (look at the top speed 130nm parts compared to the top speed 65nm parts). During some of those transitions, the introductory parts actually were slower clocked than the higher clocked of the previous process and didn't even catch up for some time.bcronce - Thursday, June 5, 2008 - link
Does anyone know why Intel is claiming NUMA on these? I'm assuming you need a multi-cpu system for such uses, but how is the memory segmented that it's NUMA?bcronce - Thursday, June 5, 2008 - link
Seems Arstechnica(http://arstechnica.com/articles/paedia/cpu/what-yo...">http://arstechnica.com/articles/paedia/...-you-nee... has info on NUMA.Assuming more than 1 node being used, each node connects to the Memmory hub and gets assigned it's own *default* memory bank. A one node computer won't see any diff, but a 2-4 node will get a default memory bank and reduced latencies. A node can interleave the data amoung the 2-4 memory banks, but DDR3 is freak'n fast and probably best just streaming from your own bank to reduce contention amoung the nodes.
RobberBaron - Thursday, June 5, 2008 - link
I think there are going to be other issues revolving around this chip. For example:http://www.fudzilla.com/index.php?option=com_conte...">http://www.fudzilla.com/index.php?optio...amp;task...
Nvidia's Director or PR, Derek Perez, has told Fudzilla that Intel actually won't let Nvidia make its Nforce chipset that will work with Intel's Nehalem generation of processors.
We confirmed this from Intel’s side, as well as other sources. Intel told us that there won't be an Nvidia's chipset for Nehalem. Nvidia will call this a "dispute between companies that they are trying to solve privately," but we believe it's much more than that.
AmberClad - Thursday, June 5, 2008 - link
That still leaves you with CrossFire and cards with multiple GPUs like the 9800 X2. It's a tiny fraction of the market that actually uses SLI anyway.Eh, who knows, maybe Nvidia will finally cave and grant that SLI license, and we'll finally have decent chipsets with SLI.
chizow - Thursday, June 5, 2008 - link
Agreed, as much as I love NV GPUs, I'm tired of having SLI tied to NV's buggy chipsets. Realistically I'd probably just get an Intel chipset with Nehalem even if there was an Nforce SLI variant and just go with the fastest single-GPU processor.Baked - Thursday, June 5, 2008 - link
Maybe I can finally grab that E8400 when it drops to $50.