Unified Memory Architecture
Challenges for Short-Term Solutions
Like other fashions that run in cycles, computer manufacturers today are rethinking the unified memory architecture, or UMA, a technology used nearly twenty years ago in the original Apple II computer. For the short term, Rambus believes, UMA makes sense for only one segment of the computer marketplace. Perhaps more importantly, Rambus sees several challenges ahead for developing practical UMA solutions that address the memory, performance and cost constraints of this market segment.
The allure of UMA is simple -- save money and boost performance by reducing redundancies and increasing the level of integration. Instead of separate and redundant memory systems for the CPU and graphics controller, and the relatively slow path between them, build a single, faster memory system that can serve both. By doing so, it is believed, the total amount of memory in a PC can be reduced without sacrificing performance. Further, a higher level of integration and performance can be achieved, and further savings realized, by combining the system memory controller with the graphics controller on a single chip.
But while the idea is simple, the reality is much more complex, and cost and performance concerns cloud the UMA picture. Designers are concerned about the potential performance losses due to memory contention, reduced operating system memory and limited memory bandwidth. And they wonder how they can meet the conflicting requirements of higher performance memory and lower cost.
UMA requires an arbitration mechanism to let the CPU and graphics controller share the memory bus peacefully, enabling CPU access, screen refresh, video processing, and graphics operations to occur in an orderly fashion. If not handled correctly, contention can seriously affect performance. Contention can be significantly reduced with a "tightly coupled," single-chip, graphics/system memory controller. But only "loosely coupled" UMA implementations using separate system memory and graphics controllers will be available throughout most of 1996 and 1997. With a loosely couple UMA environment, however, performance may suffer as by much as 10 percent, according to some estimates.
Setting aside a portion of main memory for graphics reduces the amount left for the operating system and applications. But Windows 95 performance is particularly sensitive to memory size in the 4 to 12 megabyte range. In an 8-megabyte system, for example, reducing the available memory for the OS by one megabyte results in a performance loss of about 15 percent by some estimates. Microsoft is so concerned about the potential effects UMA might have on system performance, it is reported to have circulated a document to systems vendors warning of this problem. According to EE Times, Microsoft called UMA a "PC-performance killer that will potentially degrade video and 3D graphics performance by as much as 30 percent."
By most estimates, users can expect to save about $50 to $100 on a PC using UMA. This savings will mean little to most business users, who are much more concerned with performance. On the consumer side, where price is very much an issue, the need for at least 16 megabytes of system memory leads to the conclusion that UMA, at least in the short run, is best suited only to mid-range consumer PCs. Low-end consumer PCs, typically those costing below about $1500, ship with only 8 megabytes of memory, while those purchasing consumer PCs costing over $2500 generally don't want to make any compromises and wouldn't be interested in UMA.
Designers of consumer-oriented UMA PCs face a dilemma, however. While Pentium-class processors and multimedia applications such as 3D graphics and MPEG video have moved into mainstream use, they place very high performance demands on systems. In fact, the memory bandwidth requirements for these consumer PCs have been consistently underestimated. Yet, this is the most price-sensitive segment of the market. How can designers meet the dual requirements of high performance and low cost?
A realistic evaluation shows that consumer PCs, in many cases, require higher performance memory systems than business PCs. The three greatest demands on memory bandwidth are CPU access, screen refresh and drawing operations.
An analysis performed by Rambus shows that a Pentium-class CPU typically requires 1 megabyte per second (MB/s) of memory bandwidth per megahertz of clock rate. For example, a 133-MHz Pentium CPU, which will likely be the standard for mid-range consumer PCs by the end of 1996, requires 133 MB/s of sustained memory bandwidth. But the peak bandwidth requirement, such as when a cache miss occurs, is 533 MB/s for zero-wait-state performance.
The graphics controller, whether loosely or tightly coupled, requires about 160 MB/s to refresh the screen for a typical 1024 by 768 by 16 display. 2D graphics operations average about 50 MB/s while 3D graphics operations need about 300 MB/s. MPEG video needs about 150 MB/s, but 3D graphics and video rarely occur at the same time, and so for estimation purposes, the video component can be ignored. The total bandwidth requirement for a mid-range consumer PC is summarized in the following table.
Function Sustained Bandwidth
(MB/s)Peak Bandwidth
(MB/s)CPU Access 133 533 Screen Refresh 160 160 2D Graphics 50 50 3D Graphics 300 300 MPEG Video
(not included)[150] [150] Total Bandwidth 643 MB/s 1043 MB/s Given these requirements, how do the available memory technologies -- EDO DRAM, SDRAM and RDRAM -- measure up? 64-bit-wide EDO memory doesn't even come close to meeting the UMA bandwidth requirement, topping out with a 200 MB/s sustained bandwidth. While a small, high-speed cache could reduce the sustained bandwidth requirement somewhat, it also adds complexity and cost. And it would never meet the need for 3D graphics and video.
A two-channel, 16-MB Rambus implementation using two 64-Mb RDRAMs provides 1066 MB/s peak and about 850 MB/s sustained bandwidth, clearly meeting the UMA requirements without the need for a cache. The 31-pin interface reduces controller and expansion costs, while the simple two-chip solution requires no glue logic, uses less board real estate and less power, and offers up to a tenfold reduction in electromagnetic interference. Cost-wise, 64-Mb RDRAM achieves parity in die size with 32-bit-wide SDRAM or EDO DRAM. 64-Bit-wide synchronous DRAM operating at 66 MHz provides 533 MB/s peak and 425 to 430 MB/s sustained bandwidth. But SDRAM also requires 115 to 118 pins on a controller chip and a 168-pin expansion interface. Thus, SDRAM provides only half the bandwidth of RDRAM using twice the number of pins, and more importantly, doesn't meet the UMA requirement. The ripple effect of all those pins also keeps overall costs relatively high.
While UMA may eventually find application in PCs of all types, it will be some time before it is used in anything but mid-range consumer PCs. Given the performance requirements and cost sensitivities of this market, Rambus DRAM clearly provides system designers with the best overall solution. And Rambus DRAM can provide even greater bandwidths easily and cost-effectively for high-end UMA systems when sophisticated, tightly coupled controller solutions make it to market.
# # #