CrossFire Xpress 3200: RD580 for AM2
by Wesley Fink on June 1, 2006 12:05 AM EST- Posted in
- Motherboards
Disk Controller Performance
With the variety of disk drive benchmarks available, we needed a means of comparing the true performance of the wide selection of controllers. The logical choice was Anand's storage benchmark first described in Q2 2004 Desktop Hard Drive Comparison: WD Raptor vs. the World. The iPEAK test was designed to measure "pure" hard disk performance. The hard drive is kept as consistent as possible while varying the hard drive controller; The idea is to measure the performance of a hard drive controller with a consistent hard drive.
We played back Anand's raw files that recorded I/O operations when running a real world benchmark - the entire Winstone 2004 suite. Intel's iPEAK utility was then used to play back the trace file of all IO operations that took place during a single run of Business Winstone 2004 and MCC Winstone 2004. To try to isolate performance differences to the controllers that we were testing, we used the Maxtor 120GB 7200 RPM 8MB cache IDE drive in all IDE tests. SATA1 tests used the 60GB 7200RPM 8MB DiamondMax Plus 9, and SATA2 was tested with the Hitachi 250GB SATA2 drive with SATA2 enabled with the Hitachi utility. The drive was formatted before each test run and a composite average of 5 tests on each controller interface was tabulated in order to ensure consistency in the benchmark.
iPEAK gives a mean service time in milliseconds; in other words, the average time that each drive took to fulfill each IO operation. In order to make the data more understandable, we report the scores as an average number of IO operations per second so that higher scores translate into better performance.
Any concerns about SB600 should be put to rest with these tests. IDE, SATA and SATA2 test results are very competitive with NVIDIA, ULi, and Silicon Image. The performance patterns hold steady across both Multimedia Content IO and Business IO, with the ULi, ATI, and Silicon Image based disk controllers providing the fastest IO operations followed by the on-board NVIDIA nForce4 SATA controllers. The performance generated by the ULi and ATI IDE controller logic is particularly excellent, while the SATA performance of both is up to 12% better when compared to the nForce4 chipset. The SATA performance of the Silicon Image 3132 is very competitive with the core logic chipsets in our tests.
Memory Testing - Optimum tRAS
As expected, DDR2 memory behaves quite differently than DDR in tRAS testing. As you can see from the standard chart below, a 2GB kit of Corsair 8500 (DDR2-1066) experienced steadily increasing bandwidth until the maximum tRAS setting of 18 was reached.
This is a very different pattern than DDR tRAS testing, where maximum bandwidth was reached at some intermediate tRAS setting and bandwidth decreased as tRAS was decreased or increased from this optimum value. In fact, at tRAS 18 we did get the highest bandwidth with all else equal, but the tRAS 18 setting was unstable - causing memory failures and random reboots.
We did further memory testing using Sandra 2007 unbuffered test results and found the optimum combination of bandwidth and stability was achieved at a tRAS setting of 13. Similar results were achieved with the DDR2 8500 Corsair memory on the nForce 590 chipset. We have shared our test results with Corsair and asked for more information on tRAS settings, performance, and stability with high-speed DDR2 memory. All stock benchmarking was performed with Corsair 8500 settings of DDR2-800 at 3-3-3-13 settings at 2.147V.
Memory Bandwidth
Memory bandwidth performance was verified using Sandra 2007. Both buffered and unbuffered tests were run with the stock 4800+ at DDR2-800 3-3-3-13 at 2.147V.
Both standard Buffered Sandra 2007 Memory Performance and Unbuffered Performance are almost identical in the ATI RD580 AMD and the NVIDIA 590 chipsets. This clearly demonstrates that both architectures perform about the same using the same memory and the same CPU with on-board AM2 memory controller. Any differences between the ATI and NVIDIA AM2 memory scores are likely the result of memory tweaking.
You can clearly see the AM2 processor exhibits dramatically higher memory bandwidth than the Athlon64 in Socket 939 running DDR memory. Unfortunately, that much improved memory bandwidth does not currently translate into similarly improved performance.
With the variety of disk drive benchmarks available, we needed a means of comparing the true performance of the wide selection of controllers. The logical choice was Anand's storage benchmark first described in Q2 2004 Desktop Hard Drive Comparison: WD Raptor vs. the World. The iPEAK test was designed to measure "pure" hard disk performance. The hard drive is kept as consistent as possible while varying the hard drive controller; The idea is to measure the performance of a hard drive controller with a consistent hard drive.
We played back Anand's raw files that recorded I/O operations when running a real world benchmark - the entire Winstone 2004 suite. Intel's iPEAK utility was then used to play back the trace file of all IO operations that took place during a single run of Business Winstone 2004 and MCC Winstone 2004. To try to isolate performance differences to the controllers that we were testing, we used the Maxtor 120GB 7200 RPM 8MB cache IDE drive in all IDE tests. SATA1 tests used the 60GB 7200RPM 8MB DiamondMax Plus 9, and SATA2 was tested with the Hitachi 250GB SATA2 drive with SATA2 enabled with the Hitachi utility. The drive was formatted before each test run and a composite average of 5 tests on each controller interface was tabulated in order to ensure consistency in the benchmark.
iPEAK gives a mean service time in milliseconds; in other words, the average time that each drive took to fulfill each IO operation. In order to make the data more understandable, we report the scores as an average number of IO operations per second so that higher scores translate into better performance.
Any concerns about SB600 should be put to rest with these tests. IDE, SATA and SATA2 test results are very competitive with NVIDIA, ULi, and Silicon Image. The performance patterns hold steady across both Multimedia Content IO and Business IO, with the ULi, ATI, and Silicon Image based disk controllers providing the fastest IO operations followed by the on-board NVIDIA nForce4 SATA controllers. The performance generated by the ULi and ATI IDE controller logic is particularly excellent, while the SATA performance of both is up to 12% better when compared to the nForce4 chipset. The SATA performance of the Silicon Image 3132 is very competitive with the core logic chipsets in our tests.
Memory Testing - Optimum tRAS
As expected, DDR2 memory behaves quite differently than DDR in tRAS testing. As you can see from the standard chart below, a 2GB kit of Corsair 8500 (DDR2-1066) experienced steadily increasing bandwidth until the maximum tRAS setting of 18 was reached.
Memtest86 Bandwidth ATI CrossFire Xpress 3200 AM2 with Athlon X2 4800+ |
|
6 tRAS | 2047 |
7 tRAS | 2047 |
8 tRAS | 2047 |
9 tRAS | 2047 |
10 tRAS | 2047 |
11 tRAS | 2140 |
12 tRAS | 2140 |
13 tRAS | 2191 |
14 tRAS | 2191 |
15 tRAS | 2242 |
16 tRAS | 2242 |
17 tRAS | 2298 |
18 tRAS | 2298 |
This is a very different pattern than DDR tRAS testing, where maximum bandwidth was reached at some intermediate tRAS setting and bandwidth decreased as tRAS was decreased or increased from this optimum value. In fact, at tRAS 18 we did get the highest bandwidth with all else equal, but the tRAS 18 setting was unstable - causing memory failures and random reboots.
We did further memory testing using Sandra 2007 unbuffered test results and found the optimum combination of bandwidth and stability was achieved at a tRAS setting of 13. Similar results were achieved with the DDR2 8500 Corsair memory on the nForce 590 chipset. We have shared our test results with Corsair and asked for more information on tRAS settings, performance, and stability with high-speed DDR2 memory. All stock benchmarking was performed with Corsair 8500 settings of DDR2-800 at 3-3-3-13 settings at 2.147V.
Memory Bandwidth
Memory bandwidth performance was verified using Sandra 2007. Both buffered and unbuffered tests were run with the stock 4800+ at DDR2-800 3-3-3-13 at 2.147V.
Both standard Buffered Sandra 2007 Memory Performance and Unbuffered Performance are almost identical in the ATI RD580 AMD and the NVIDIA 590 chipsets. This clearly demonstrates that both architectures perform about the same using the same memory and the same CPU with on-board AM2 memory controller. Any differences between the ATI and NVIDIA AM2 memory scores are likely the result of memory tweaking.
You can clearly see the AM2 processor exhibits dramatically higher memory bandwidth than the Athlon64 in Socket 939 running DDR memory. Unfortunately, that much improved memory bandwidth does not currently translate into similarly improved performance.
71 Comments
View All Comments
peternelson - Thursday, June 1, 2006 - link
a) Don't keep saying that GBethernet performance is just for bragging rights because broadband is only 100Mbps tops!
GBE is great for peer to peer lan gaming, for network backup of files or communicating with a fileserver, or for streaming media files.
b) The competitor to Crossfire Xpress 3200 is Nvidia 590 ie Nforce 5 not nforce 4.
Repeated comparisons with nforce4 are thus irrelevant. eg networking, usb, sata performance.
c) Usb performance of nvidia is NOT equal to ATI, it is OVER 20% faster than ATI.
d) I suspect that although you hype up Intel's CSA architecture, that a GOOD pcie implementation might give better performance.
If so the remote machine in your networking tests could be bottlenecking the test, which would explain why the high results all flatline at virtually the same.
Repeat with a high end card (like myrinet 10GE on 8x pcie or at least a good pcie 1gbps target to establish if I'm right or not). If so stop using CSA cards.
Say if you are using direct wiring or a switch in your network tests.
Configure the nvidia 2xgbe lan in teaming, and bounce it at another 2xgbe in teaming mode. What is your speed and cpu performance then?
It's unfair to compare cpu at high network traffic to cpu at low network traffic.
You should be using net test against nforce 5. AND turn ON the hardware offload for a true comparison.
e) to benchmark the SATA II properly, are you maxing all ports or just testing against one hard drive as I suspect. The raid performance may differ significantly from a single drive. In any case also compare with a high end raid card like Areca 1260 to show how lame chipset solutions are in comparison.
f) The review summary should highlight that against the rival 590, the 590 has two more sata and an extra Gbe port.
g) Since ATI cross fire and nvidia sli are not interchangeable (I wish they were), the choice of chipset is more down to graphics preference, which for me at the moment gives nvidia an advantage.
h) So you have discussed ATI and Nvidia chipsets. WHAT ABOUT INTEL? ie a review of the similar generation Intel 965 chipset on some equivalent processor would tell if they are a valid competitor eg in terms of SATA, USB throughput speeds.
i) Also you are not mentioning the speed boosting (auto overclocking) Nvidia offers on a fully nvidia platform. ATI does not appear to have an equivalent feature. On the otherhand the manual overclocking capabilities seem to be there.
Trisped - Thursday, June 1, 2006 - link
A) The only time you will ever get enough network traffic in a LAN game to saturate a 1Gb/s connection is if you are running a server for over 100 PCs. Most games are designed for online multi player, which means the MAX they will use for standard game play is 6Mb/s from host to client and 512 Kb/s from client to host. With so many people still on 56k and cheep DSL (1.5Mb/s DL and 128 Kb/s) that is a bit generous. So even with 100 computers using the 6Mb/s by 512Kb/s connection speeds you will still not saturate a 1Gb/s duplex connection. Plus, with that many connected computers you would be hard pressed to find a powerful enough processor and enough RAM (that works in an nForce 500 board) to drive a server receiving that much data.B) Yes, they didn’t have enough nForce 500 stats, but if you read the review of nForce 500 you would find Which implies that there is no performance increase. Yes it would be nice to have the actual numbers, but since they just released the nForce 500 stats they probably didn’t have any boards available when doing that part of the testing. Plus, since the ATI board was a reference, they may have asked that AT not compare it to the nForce till actual boards are available.
C) Actually the nForce4 is 16.4% faster, or the ATI SB600 is14.2% slower, but the ATI solution is noticeably slower, but not 20%. It should also be noted that the only other SB USB controller tested was the ULi 1575 which has pretty much the same performance as the SB600.
D) True the results max out at about 950Mb/s but they are still progressing, so max bandwidth has not been reached. Also, the fact that you can get to 5% of a theoretical max transferee rate is amazing enough when you consider USB and Fire Wire are at 61% and 57% of their theoretical max transfer speeds. And don’t think that using a 10Gb card is going to fix the problem. The 10Gb card will just switch to 1Gb mode when connected to a network were the lowest devices is 1Gb, unless you connect it to a switch which would have to buffer everything, clean up, reformat, and retransmit it meaning that you would risk testing max speed of the router, rather then the max speed of the server. I think, if anything, these tests are becoming obsolete. Before they were important because not all Gb Ethernet connections were the same. Some were PCI, some were PCIe, some were well done, some were quite poor. Now that we are seeing that both ATI and NVIDIA solutions are performing very close to the max possible it doesn’t seem likely that a .02% difference is going to be all that important to anyone.
E) Yes, RAID performance will differ, but most people do not have a RAID setup, so AT keeps it simple. I would like to voice my support for comprehensive RAID testing though. It is important to know if you are buying a board that has poor RAID 5 or JBOD performance. I would also like to see what kind of transferee rates we could get out of a Raptor RAID as compared to other, more economic solutions.
F) I can see the 2 SATA ports, but the 590 only had 6, where the ATI reference board had 4+4, making the matter more complicated. AT was probably waiting for real boards to test before comparing something with such high potential to not be a point. The extra Gb Ethernet port is useless for everyone expect the small server market.
G) They pointed out the Crossfire/SLI exclusivness in the review. They have also pointed out that for half of the games tested the Crossfire solution was better then SLI, and the other half the situation was reversed. Since the 7900GTX costs $460+ with most closer to $500 and the X1900XT for $420+ (most around $460) and the X1900XTX at $470+ with most around $500 ATI is the better solution for the $. AT also encourages you to buy ATI because they want to encourage competition. If people keep buying NVIDIA just because they like it prices on NVIDIA cards will go up while innovation stagnates (just like what the iPod has done)/
H) Yes, a new Intel bench mark would be nice, but Intel doesn’t make AMD boards.
I) Yes they did, as noted in this review and the one for the auto overclocking, there is barely any benefit. In this review they pointed out,
Wesley Fink - Thursday, June 1, 2006 - link
As we stated in the nF5 review, in comments, and in followup to questions on the nF5 launch review, the CORE of nf5 is the same as nF4. This is why feature performance of SATA, USB, and other items is exactly the same as nF4 - including some issues with nF4. We were not favoring ATI by using nF4 scores in some of our feature charts. The fact is the results for nF4 and nF5 are the SAME on those particular features.Surely if you consider that the sounth "chip" is the same C51 that was used in nForce4 dual X16, it should be obvious that basic performance is the same. It was also shown by anyone that reviewed 590 that "LinkBoost" makes almost no difference at all in performance. To me LinkBoost matters because nForce 590 can now do almost 1500 HTT or more across the board - which means you don't have to lower from 5X HT until well above 300 clock speed. That is a feature that was already there in RD580 939 and is continued in RD580 AM2.
Spoelie - Thursday, June 1, 2006 - link
a) peer to peer GAMING does not even need 10mbps. Only file serving in memory or raid can reach the ceiling of GBE cards.b) I agree that the nforce 5 seems to be missing in some comparison graphs.
c) What I would like to see as well is cpu usage during USB transferring, how does it compare to Firewire (is a usb2.0 connection to hd preferred or firewire?)
d) The speeds are at about their theoretical ceiling, what difference is an extra 1-2mbps gonna make in the end? The 2xGBE is not gonna be of any use to any home desktop. 1GBE is hardly getting used. Didn't hardware offloading prevent firewalls to be used?
f) The review is of the reference board and not a shipping one. The difference is already highlighted earlier in the review. It doesn't matter tho if shipping boards provide the extra sata en gbe controllers, equalising or surpassing the 590 feature set.
g) The x1900xt is faster than the 7900gtx in most scenarios, tho the difference isn't huge. Besides, this was already highlighted in the summary.
h) How about you ask that question again when Intel MAKES AN AMD CHIPSET! This is comparing chipsets for the AM2 platform, and last I checked, Intel isn't a competitor there.
i) You clearly haven't read the article.
peternelson - Thursday, June 1, 2006 - link
a) well a lan gaming SERVER might have that much traffic going to EACH lan client.
My own interest is in computational cluster interconnect, or network rendering of graphics/3D. Such apps can use it.
Even a gamer playing Half life might want to be saving his DVB tv card stream onto a NAS storage device, or download some new linux distro in the background, thus multiple apps use the network interface. With virtualisation, such multiple uses could become more widespread.
c) Firewire and usb2 are both slow compared to ESATA. Firewire should be used for DV camcorders, and usb2 for peripherals like printers. Neither are preffered for disk as usb is HALF duplex, and firewire has troubles if you put multiple devices on one channel.
d) 2GBE would help my clustering and rendering. There may also be some future NAS with 2xgbe in teaming which would increase storage to pc bandwidth.
I would be using hardware firewall anyway, but I think in future a software workaround may come for that ie purposely written for the NF5.
f) agreed, but if ati can add more features externally, so can nvidia. I am just thinking of comparing what is built in.
g) gpus of ati and nvidia leapfrog each other on each subsequent launch. At the moment I personally am liking NV for the generation I would buy into.
h) True, Intel are unlikely to ever make a chipset for an amd processor. BUT I mean compare how good is usb, sataraid etc in Intel. If it is say double that means ATI and nvidia have scope to improve up to that level.
Missing Ghost - Thursday, June 1, 2006 - link
e-SATA is half duplex too (like almost all hdd interfaces except SAS), but I agree that it's the way to go. Anybody that would use USB 2.0 as an HDD interface does not know much about HDD interfaces.Stele - Friday, June 2, 2006 - link
or can't find an eSATA case where he/she is, and/or has other PCs/laptops on which the external HDD is also used but which don't have eSATA ports, and/or simply can't afford eSATA cases.
Careful with sweeping statements ;)
Having said that, USB is a bad choice for external HDDs. Until eSATA becomes more commonplace, Firewire is still arguably the better option in terms of overall price/performance/availability for now.
lopri - Thursday, June 1, 2006 - link
I've raised this issue a couple times at the forum, but it didn't get much opinions - I'm guessing it's because of general lack of interest in AM2 platform. So I'd like to hear from AT staff - Wesley, Jared, Gary, anyone please.Q.1 In comparison of Socket 939 vs Socket AM2, why are DDR400 DIMMs used for Socket 939 platforms? I understand you guys are using very fast timing (2-2-2) and 1GB sticks, but I think the majority of Socket 939 users still have 512MB sticks. More than anything, I doubt many AMD "enthusiasts" run their memory @200MHz/2-2-2. Think about how many memory/motherboard reviews AT have conducted. They are countless. TCCD sticks running 270MHz+ @2.5-3-3 timing are very common. Even decent timing DDR500 1GB sticks (like 2.5-3-2/3-3-2) can be purchased for very, very low prices. I myself have 2 Socket 939 rigs, and the main one has 2 x 1G Infineon sticks running DDR500/2.5-3-2-6, and the second one has 2 x 512MB TCC5 sticks running DDR600/2.5-3-3-7 (2.5-4-3-7 for 3D). Wouldn't it be fair and more realistic, if you compare more common DDR configuration to DDR2 configuration? We've had manufacturers pumping out PC3700, PC4000, PC4200, PC4800, etc. memory sets for years.
2. This boggs another very serious question. Have you check the prices of decent DDR2-800 sticks? I myself did for the first time today using RTPE, and they raised my eyebrows. Basically DDR2-800 @3-3-3 SPD (or even EPP) is non-existent. And DDR2-800 @4-4-4 sticks are incredibly expensive. Before, I never paid attention to DDR2 like many here, and just assumed they should be cheap considering how long they have been out there. Boy, was I wrong. I'd say sticks like what this review used (DDR2-800/3-3-3) would cost near $500. That's insane.
3. Also, I noticed in the review that even with such uber sticks - 2T?! What's up with that? Is it the AM2 CPU, or the RD580, or Cosair? In the past, AT's memory/motherboard reviews always checked the command rate for us along with overclocking. But I didn't find any explanation on this (just a small footnote(?) in a table) What is(are) the limitation(s)? Can you at least run DDR2-667 @1T or all current AM2 platform can only run 2T regardless of memory speed? If you can indeed run memory @1T with reduced speed, what'd be the trade-offs and performance difference?
Wesley's reviews are among my favorites on AT and in the past I used to be impressed by his clear (but through) explanation. But this review seems to have too many holes (whcich rather look intentional?) to just skip to next pages.
Other than my above rant (again, I believe there probably were reasons why the reviews should have been written the way it has been), this review is absolutley superb. I've been waiting for ATI's competitive chipset (NV is getting way too big) and hopefully they can make up their lateness with the quality. Personally I'm waiting for DFI's incarnation of RD580 AM2 motherboard.
Thanks again for hard work, Wesley. (and Jared/Gary)
Gary Key - Thursday, June 1, 2006 - link
We were doing our best to equalize the components and settings between the two platforms. This was done to show the absolute best performance of each platform during the initial chipset testing. As far as showing additional settings we can certainly look at those results in a separate article.
Our Corsair or OCZ PC8500 sticks will run at 3-3-3-9 2T at 800 with a small voltage increase to 2.2V easily although the memory is rated at 5-5-5-15 2T. I am working on a single versus dual channel DDR2 article at this time, cutting to the chase, single channel DDR2 with fast timings will provide up to 98% of the performance of dual channel DDR2 under the same conditions. It might be something to think about when looking at $350~$500 DDR2 2GB kits.
AMD introduced this platform with very conservative timings and tables for the board and memory suppliers to follow. We expect to see 1T timings at 800 later this year as AMD "massages" the memory controller. I ran tests at DDR2667 1T and they were basically the same or slightly worse than DDR2800 at 2T with all other settings being equal. The problem is we cannot run tRP and tRCD lower than 3 currently so any advantage of 1T is being wasted due to higher latencies. On a couple of our review boards we could also run DDR2800 at 4-4-4-15 1T but the 3-3-3-13/9 2T setting provided better memory bandwidth and lower latencies overall. We are still testing various memory settings as each board has been a little a different in optimizations made by each supplier.
We will have a separate review on EPP and Memory settings for AM2 in the near future.
Thanks...
JarredWalton - Thursday, June 1, 2006 - link
It's worth mentioning that 2-2-2-7 1T DDR-400 ends up having slightly worse latencies than 4-4-4-14 2T DDR2-800 -- and I intentionally chose latencies that were exactly double. With the faster base clock speed, DDR2-800 has identical net latencies (2 cycles @ 200 MHz ~= 4 cycles @ 400 MHz) but a higher bandwidth, putting it on top. If you throw 3-3-3-9 2T into the equation, DDR2 clearly comes out on top in overall performance. (You would basically need 1.5-1.5-1.5-4.5 1T DDR-400 memory to match it, which obviously doesn't exist.)That doesn't mean we recommend you drop everything and go out and upgrade to socket AM2 *right now*, but if you're planning on buying a completely new system anyway you might as well go with the new platform. Our testing components are chosen in order to maximize longevity of the testing platform, and we once again don't recommend you go out and spend $500 on memory if you're building a typical PC. If you're planning on getting an FX-62 and a couple top-performing graphics cards to go with it, then you'll probably want the best memory available as well, in which case this Corsair RAM is great stuff.
Your comments earlier about memory speeds (i.e. why do we use DDR-400 with 2-2-2-7 1T timings) are basically looking at overclocked systems. We haven't really gotten into the details of how the systems overclock right now, but the availability of very fast DDR2 memory definitely changes things. My experience so far is that most of the AM2 motherboards are easily breaking 300 MHz HyperTransport speeds, and you can do all that without sacrificing memory timings.
Take care,
Jarred Walton