Note: This review originally appeared in the November 1994 issue of the Southern Maine Apple Users Group (SMAUG) Newsletter. It has been modified slightly from the original version, but no effort has been made to update information that has become out-of-date since the original publication.
Well, it’s been over seven months since the first litter of Macintosh computers based on the powerful new PowerPC chip entered the world. As “one of the first on my block to have one,” I thought it might be interesting to take a look back at the experiences I’ve had with my Power Mac and share some of my thoughts with you. I’ll start this month with a discussion of the PowerPC microprocessor and what “RISC” really means. Next month, I’ll talk a little about the ups and downs of being on the (b)leading edge of Apple technology. Finally, a couple months from now, I’ll give you my thoughts on the future of the Power Macintosh and the PowerPC in general.
On March 14, 1994, Apple introduced its long-anticipated computers based on the new PowerPC microprocessor. Two days later, I attended Apple’s New England “unveiling” in Boston. On my way home, I stopped at Computer Town and bought one: my very own Power Macintosh 7100/66. Ah, I could almost smell the power! I rushed home, set it up, and connected it to all my existing peripherals. I held my breath as I pressed the power-on key. But instead of that familiar startup chime, I heard…
But I’m getting ahead of myself. The story really begins back in October of 1991, when Apple, IBM, and Motorola announced a series of cooperative alliances designed to end the Windows/Intel domination of the personal computer industry. One of these was the PowerPC alliance. Motorola and IBM (with some help from Apple) would pool their microprocessor talents to create a new processor architecture to take on the best Intel could dish out. Apple and IBM would use the chips in their own machines and evangelize others to do so as well.
The first chip in the PowerPC family is the 601. Designed by Motorola and IBM and based largely upon IBM’s POWER architecture used in their engineering workstations, the 601 is blisteringly fast. Even in its slowest incarnation (50 MHz), it blows the doors off the fastest 68040 processor used in the Quadra family of Macs. And the 601 is just the first of at least four processors in the PowerPC family. Already available (though not yet incorporated in any finished systems) are the 603, designed for low-power applications such as notebook computers, and the 604, the “next generation” of the 601. The king of the hill, the 620, has just reached the first stages of manufacturing.
The whole PowerPC family of chips use a Reduced Instruction Set Computing (RISC) architecture to help achieve their spectacular performance. There’s been quite a bit of confusion lately about what RISC really means, thanks in part to less-than-accurate articles in a variety of publications. As long as I’m talking about the PowerPC, I might as well try my hand at clearing things up a bit.
RISC processors differ from CISC (Complex Instruction Set Computing) processors such as Motorola’s 680x0 series and Intel’s 80x86 series in several ways. The most obvious is that they use a reduced instruction set. This doesn’t necessarily mean that the chip supports a reduced number of instructions, but rather uses instructions of reduced complexity (RISC processors generally do have a smaller instruction set, but this is not always true). Where a CISC chip might have a single instruction that loads two numbers from memory, adds them together, and writes the result back out to memory, a RISC chip would likely accomplish the same task with a series of four instructions: read one number from memory, read the other, add the numbers, and write out the result. This approach may seem like a giant step backwards in technology, but combined with some other RISCy features, it can make for blazing speed.
Think about the task of washing laundry. There are many ways you could get the job done. One option would be to have a single mega-machine wash and dry. You’d load it up, press a button, and come back an hour and a half later to unload your clean, dry clothes. You’ll finish one load about every hour and a half. To get more laundry done in the same amount of time, you’d have to speed up the machine, or make it larger. Either way, you would soon run into some rather restrictive limits.
On the other hand, what about the more common situation of a separate washer and dryer? You load the washer and press a button, half an hour later, you transfer the clothes to the dryer, start it, and fill the washer with another load. Simply by splitting the process in to two, you’ve increased the throughput. You can wash and dry at the same time, something you couldn’t do with the “mega- washer.” Note that a single load takes no less (or more) time than in the first case, but if you’ve got more than will fit in a single load, you will come out ahead (a load will be completed every hour instead of every hour and a half). In the world of microprocessors, this approach is called “pipelining,” and it’s used to some extent in every microprocessor.
But you may notice another slow-down with the typical washer-dryer combo. While a “dry” cycle may take an hour or more, a “wash” generally takes much less time. As a result, the washer will often sit idle waiting for the dryer to finish its cycle. In processor-speak, this is known as a “stall” or “bubble” in the pipeline. So what’s a laundry designer to do? Splitting the whole process in half helped, so why not split the drying process into two stages, lets say “drying part 1” and “drying part B.” If we plan it right, we’ll now have three separate stages that take the same amount of time to complete, eliminating the stall. Assuming you have plenty of laundry to do (in other words, you are able to “keep the pipeline full”) you can complete a load every half hour. Further division of the wash and dry cycles would result in further improvements in throughput.
Our new, improved laundry system has an interesting property I mentioned in the last paragraph. Each stage of the process takes the same amount of time to accomplish. This is generally the case with RISC microprocessors as well and can help improve performance even further. How? By making it easier for compiler designers to optimize the code they produce. When a programmer compiles her program (written in a “high-level” language such as C or Pascal), the compiler generally does more than just translate the high-level code into machine code. It also performs a variety of optimizations to help the program run faster. In the case above, a non-optimizing compiler would likely spit out the instructions: “load, wash, dry 1, dry 2, unload” for each load. With optimization, however, the compiler would instead produce: “load batch 1, wash batch 1, load batch 2, dry batch 1, wash batch 2, load batch 3, dry batch 2, etc.” By interleaving instructions, the compiler is able to more efficiently use the available resources. But to do so, the timings of each step must be well defined and predictable. This is one of the most important differences between RISC and CISC processors. Although most CISC processors use a variety of techniques normally associated with RISC processors, this difference persists.
I think I can make one more point with the laundry analogy before I have to abandon it. Think about the length of the instruction you would have to issue to the”mega-washer” to get it to do what you desire. That single instruction would have to include all of the settings for the entire wash/dry process. As such, it would probably be fairly complex. On the other hand, the individual instructions for the improved laundry system could be much shorter. This is another way in which RISC processors are “reduced.” Whereas CISC instructions can vary in length tremendously depending on the complexity of the instruction, RISC instructions are uniform in length. This further simplifies the design of the processor by making things easier on the “decode unit,” the portion of the chip that determines what an instruction means and passes that information on to the rest of the chip. CISC decoders must first determine how long the instruction is before they can even get started, wasting valuable nanoseconds.
The last big difference I’ll talk about between RISC and CISC processors involves how they interact with memory. The main memory in most computers takes much longer for a processor to access than memory contained on the processor itself. As such, it’s much quicker to add two numbers that are stored on the processor than two numbers that are stored in memory. RISC chips generally have much more of this “internal” memory than CISC chips. In addition, RISC processors generally lack instructions that act directly upon the contents of “external” memory. Instead, as I mentioned earlier, they have separate instructions that read the contents of memory, act upon them internally, and write them back out. This “load & store” architecture is another of the most important RISC traits.
The internal processor memory is divided into two parts: registers and cache. Registers are where all the action is, so to speak. This is where the processor loads things it needs to act upon. It’s also where the processor puts the results of all of its computations. The more registers a processor has, the more it can perform without having to wait for information to be read from “external” memory. RISC processors are loaded with registers, especially when compared to CISC processors. For example, the PowerPC 601 has 32 “general purpose” registers for use with integers and another 32 “floating point” registers used for non-integer math. In comparison, the 68000 series has just 16 “general purpose” registers and, with the exception of the 68040, no floating point registers.
Cache memory sits between a computer’s main memory and the microprocessor. It is much like normal RAM, but much faster to access. RISC processors use a large internal cache to help reduce the impact of accessing external memory. By keeping commonly used data (or the next bunch of instructions) in the cache, and loading or unloading the cache at the same time as other processes are occurring on the processor, the impact of slow main memory can be greatly reduced. Many RISC processors can go a step further and support an external, or “L2” cache. This is like another layer between the processor and main memory, faster than main memory but slower than internal cache memory. It is up to the computer manufacturer to decide whether to include an external cache. In the case of the Power Macs, all three support an L2 cache, but only the 8100 includes it as standard equipment.
It is important to remember that the line between RISC and CISC is getting more blurry every day. The Intel Pentium is a CISC processor that uses many RISC-like techniques internally to help improve its performance. The same is true for the successor to Motorola’s 68040, the 68060. One RISC feature, parallelism, is so common in recent CISC chips that it barely qualifies as a difference anymore. Parallelism means that the chip has separate “pipelines” executing at the same time (perhaps one pipeline handles integer instructions, another handles floating-point instructions, and yet another handles all program branches).
So how does the PowerPC 601 stack up as a RISC processor? Well, as might be expected, it has a classic RISC design. It is quite fast, but by no means is it the fastest RISC chip available. But as I mentioned earlier, it is just the first in a family. The 620, for example, has the potential to be one of the fastest processors available. The PowerPC family offers another advantage over some RISC chips and many high- performance CISC chips: it’s cheap.
Well, I hope I haven’t bored anyone. If you found yourself reaching for a cup of strong coffee while reading this article, wait until next month, when I promise to keep the discussion less technical. I’ll be writing about some of the high and low points of my life with a Power Mac. I’ll start with what I heard when I pressed that power-on key…