ATI Radeon Xpress 200: Performance, PCI Express & DX9 for Athlon 64
by Wesley Fink on November 8, 2004 6:00 AM EST- Posted in
- CPUs
SidePort: On-Board GPU Memory
Just before the launch of the Athlon 64, we found that many chipset manufacturers were a bit worried about the performance of integrated graphics solutions and AMD's new CPU. The worries stemmed from the fact that in previous CPU/chipset architectures, the integrated graphics cores resided on the North Bridge and shared access to the system memory controller - also located on the North Bridge. With the Athlon 64, however, the memory controller resides in the CPU, increasing memory access latencies from the perspective of the integrated graphics core.The Radeon Xpress 200 supports a local frame buffer attached to what ATI refers to as their "SidePort". The SidePort is a 32-bit DDR memory interface that the integrated graphics can use either instead of or alongside the Athlon 64's memory controller.
While we assumed that SidePort was included to hide some of the latencies of using the Athlon 64's memory controller, that ended up not being true as performance in UMA mode (using the Athlon 64's memory controller) was quite respectable. It turns out that most games don't benefit too much from lower latency memory accesses (through SidePort). So, why would ATI include support for a local frame buffer with the Radeon Xpress 200? Although performance is improved with SidePort enabled, the biggest reason for supporting the feature is to reduce power consumption in mobile environments. Without SidePort enabled, the CPU needs to be awake to fetch data for refreshing the display, but with SidePort enabled, all memory accesses can occur via the Radeon Xpress 200 and the CPU can remain asleep in power saving modes.
Because of the added cost of supporting SidePort, it isn't a requirement - the Radeon Xpress 200 has four memory operating modes:
- SidePort only - In this mode, the integrated graphics core treats the SidePort memory as its local memory. If more memory is needed, it is allocated dynamically through system memory by the driver, which is significantly higher latency than the local SidePort memory.
- UMA only - In UMA mode, the only memory to which the integrated graphics has access is a dynamically allocated partition of system memory. The size of the parition is selectable from within the BIOS (ATI's reference board allows for 16 - 128MB sizes). If more memory is needed, it is allocated dynamically through system memory by the driver.
- UMA + SidePort (Interleaving Disabled) - In this mode, the total amount of "local" graphics memory is the size of the UMA partition and the amount of memory connected to the Radeon Xpress' SidePort. The integrated graphics core will first use SidePort memory until it runs out, then using system memory. If more memory is needed, it is allocated dynamically through system memory by the driver.
- UMA + SidePort (Interleaving Enabled) - By enabling Interleaving and setting the UMA frame buffer size to the same size as the memory connected to the Radeon Xpress' SidePort, a special Interleaving mode is enabled. In this mode, the integrated graphics cores will request data from both the UMA space and SidePort memory. The benefit of Interleaving is that now two reads or writes can occur at the same time, whereas with just SidePort only a single 32-bit read/write can happen at any given time. Despite the fact that UMA accesses will be higher latency, the dual ported nature of this setup improves overall performance. There are situations when a SidePort only configuration will offer greater performance if the application depends on lower latency memory accesses. If more memory is needed it is dynamically allocated through system memory by the driver.
ATI's reference board features 16MB of DDR memory attached to the Radeon Xpress' SidePort. The memory can either run synchronously with the system memory clock (200MHz for DDR400) or asynchronously, where the speed is bound by the type of memory used. In our case, the 2.5ns Samsung DDR located on the board was capable of running at the maximum frequency the BIOS allowed - 350MHz.
As we mentioned before, the SidePort memory interface is a single 32-bit channel, which at 350MHz provides 1.4GB/s of bandwidth to the integrated graphics core. At 200MHz SidePort can only provide 800MB/s of bandwidth, so the additional latency incurred by running the SidePort asynchronously with main memory is well worth the additional bandwidth.
What's truly interesting is the pretty impressive performance of running in SidePort-only mode. Granted you are limited to low resolutions, but as you will soon see, the integrated graphics core isn't really designed to run at very high resolutions. In fact, running in SidePort-only mode is faster than running in UMA only mode with a single-channel Socket-754 Athlon 64.
The charts below do a good job of showing off the performance advantages to the various operating modes of the Radeon Xpress 200.
The first thing we see is that there's a huge performance advantage to the dual channel memory controller of the Socket-939 Athlon 64, - 33% in Doom 3 and 27% in UT2004. This is far from unexpected given that the more system memory bandwidth you have, the more graphics memory bandwidth you have.
The performance advantage to using the SidePort + UMA configuration isn't insignificant either - 8.5% in Doom 3 and 7.7% in UT2004, however with the added cost we would say that the SidePort isn't absolutely necessary for desktops (but we understand its usefulness in notebooks).
We compared the graphics performance of the Radeon Xpress 200 to ATI's lowest end discrete PCI Express graphics card: the Radeon X300 SE. The X300 SE is a four-pipe version of the Radeon Xpress 200 but with only a 64-bit DDR memory interface, so the Radeon Xpress 200 actually holds a memory bandwidth advantage over the X300 SE while it is at a fill rate deficit.
We also compared the Radeon Xpress 200 to Intel's Graphics Media Accelerator 900. While the GMA 900 is obviously only available on the Intel-only 915G and the Radeon Xpress 200 is an AMD-only solution, the two offerings are slow enough that most games end up being completely GPU limited and thus the CPU differences become negligible.
You'll notice that not all of the benchmarks have scores for Intel's GMA 900; those that don't have GMA 900 scores are ones where the GMA 900 was not able to either run the game or complete the benchmark without crashing.
45 Comments
View All Comments
Maetryx - Monday, November 8, 2004 - link
Soooo.... given the products that are on the horizon, and the holiday season, would a person be best off waiting until Q1 2005 to do a fairly comprehensive upgrade to their system... or is the stiff holiday competition and Black Friday going to be the right time to do a massive upgrade?I know it's slightly off topic, but everytime I start to visualize the right combination of parts and pieces, something new gets announced with a future ship date.... Oh well, at least my expensive hobby is still exciting.
DAPUNISHER - Monday, November 8, 2004 - link
#21I have the ATi320M in my Compaq 900z and it is good for what it is. The only thing ATi has had trouble with till now is the memory controller's performance and A64 takes that out of the mix so that ATi can really show what they can do :-) For instance,They paired a POS ALi with my 320M that makes the bandwidth, even for 2100DDR terrible. Don't know why they couldmn't use ATi's version? Must have been cheaper=par for the course.
DAPUNISHER - Monday, November 8, 2004 - link
"As you can see, the Halo score for nVidia on nVidia is about the same as our past tests of ATI on ATI. nVidia on ATI is about 3% slower than the nVidia on nF4. Far Cry continues the pattern of best performance on an ATI chipset and/or an ATI graphics card. Doom 3 and Aquamark 3 are also very slightly slower on nVidia/ATI than nVidia/nVidia, but the % change of 2% to 3% is hardly significant.The ATI Bullhead is equivalent to slightly slower with an nVidia PCIe card than an nVidia nForce4 chipset running the same nVidia card. nVidia has claimed that nVidia on nVidia is a faster combination than ATI on nVidia, but we can only conclude that these performance differences are so small as to be negligible. ATI/ATI and nVidia/nVidia are the fastest combinations in our comparisons, but the differences are so tiny that they really don't matter. You can run any of these in combination with each other without any concern that you have to match Athlon 64 chipset to Graphics chipset."
WOW! You can say that yet still push ram with tight timings despite similar small performance differences over slightly more conservatively timed but much cheaper ram?
landrew - Monday, November 8, 2004 - link
What about sound and IDE performance? It seems you totaly ignored this! What about nVidia firewall and RAID? Does ATI do anything like that?I want ot know because I'll be buying an A64 soon and want the best motherboard.
mczak - Monday, November 8, 2004 - link
"The RX480/RS480 is the first ATI chipset for AMD" - this is not true. IGP 320/320M was a chipset for Athlon XP, especially the mobile version was somewhat succesful.jamawass - Monday, November 8, 2004 - link
Motherboards based on this chipset would be ideal for a cheap htpc. It would have been helpful if the reviewer had looked at video decoding performance for dvd and especially hdtv. Both the fusion gold hdtv and HDTV wonder require dx9 graphics.This forces a lot of people in the htpc community to purchase dx9 cards even though they don't game, just to improve hdtv performance.Wesley Fink - Monday, November 8, 2004 - link
#14 and #17 - If you check older reviews you will see that ATI and nVidia perform very differently in Specviewperf 7.1 benches. The performance we see here is nothing unusual.We were trying to establish baselines for both PCIe cards for the future and to compare to the past. Comparing ATI and nVidia performance with Specviewperf doesn't really tell you much. Comparing ATI to ATI in specviewperf or nVidia to nVidia can be useful.
We will also be updating to Version 8 of Specviewperf for motherboard tests in the near future.
Wesley Fink - Monday, November 8, 2004 - link
#2 - Corrected.#6 - We could have used the $1,020 3.46EE to compare to the $856 FX55, but the 3.6 P4 560 performs better in many benches. The 560 costs about $500 these days. As #14 said our goal was to compare top to top. We included the 560 for Reference. We did price/performance comparisons in our last CPU launch article.
#10 - I agree with you. The mfgs don't think ATI when they think AMD chipset. IF they look at RX480 they will change their minds, but that is a big IF.
#11 - We will do this in an upcoming nF4 retail review.
#12 - Cool'n'Quiet appears to be working properly with 2 or 4 dimms, but we did not focus on that feature. We will ask ATI about question 2.
#15 - ATI claims Gigabit PCI Express LAN is just as fast and just as cheap as on-chip Gigabit LAN. PCIe Gigabit LAN is fine, but PCI LAN can also be used with this chipset - an option the low-cost providers will probably exploit. Then again, some Tier 1 mfgs have been using PCI LAN with nForce3 Ultra to save costs.
#16 - ATI tells us the RX480 will be cheaper than the top Nforce4 Ultra chipset and more expensive than the VIA K8T890 chipset. ATI wants to underscut nVidia prices but still be a premium compared to the cheaper VIA boards.
ALL - some boards are already shipping. I have seen pics from Germany of a retail board exactly like the Bullhead except it is red. When we have more availability data we will post an update.
Ecmaster76 - Monday, November 8, 2004 - link
Anyone know what is up witht the benchmarks on page 15? They look a little strange.knitecrow - Monday, November 8, 2004 - link
what will make or break this product is the price.If mobo's based on the said chipsets are cheaper than nforce3/4 ... it can be a good budget overclocker.
No frills, just performance.