Microprogram

From Free net encyclopedia

A microprogram implements a CPU instruction set. Just as a single high level language statement is compiled to a series of machine instructions (load, store, shift, etc), each machine instruction is in turn implemented by a series of microinstructions, sometimes called a microprogram. The most common term is microcode, not microprogram.

On most computers the microcode is not produced by a compiler, but exists in a special high speed memory. The microcode is written by the CPU engineer during the design phase. In some computers the microcode is in RAM and can be altered to correct bugs in the instruction set, or to implement new machine instructions. Microcode can also allow one computer microarchitecture to emulate another, usually more-complex architecture.

The elements composing the microprogram exist on a lower conceptual level than the more familiar assembler instructions. Each element is differentiated by the "micro" prefix to avoid confusion: microprogram, microcode, microinstruction, microassembler, etc.

Microprograms are carefully designed and optimized for the fastest possible execution, since a slow microprogram would yield a slow machine instruction which would in turn cause all programs using that to be slow. The microprogrammer must have extensive low-level hardware knowledge of the computer circuitry, as the microcode controls this.

The memory in which the CPU's microcode resides is called a control store. The microcode may be stored in ROM (as a form of firmware) or loaded into RAM memory as part of the initialization of the central processing unit.

Microprograms consist of series of microinstructions. These microinstructions control the computer's central processing unit (CPU) at a very fundamental level. For example, a single typical microinstruction might specify the following operations:

Connect Register 1 to the "A" side of the ALU
Connect Register 7 to the "B" side of the ALU
Set the ALU to perform two's-complement addition
Set the ALU's carry input to zero
Store the result value in Register 8
Update the "condition codes" with the ALU status flags ("Negative", "Zero", "Overflow", and "Carry")
Microjump to MicroPC nnn for the next microinstruction

To simultaneously control all of these features, the microinstruction is often very wide, for example, 56 bits or more.

1 The reason for microprogramming
2 Other benefits
3 History
4 Implementation
5 Writable control stores
6 Microcode versus VLIW and RISC
7 See also
8 External links

[edit]

The reason for microprogramming

Microcode was originally developed as a simpler method of developing the control logic for a computer. Initially CPU instruction sets were "hard wired". Each machine instruction (add, shift, move) was implemented directly with circuitry. This provided fast performance, but as instruction sets grew more complex, hard-wired instruction sets became more difficult to design and debug.

Microcode alleviated that problem by allowing CPU design engineers to write a microprogram to implement a machine instruction rather than design circuitry for that. Even late in the design process, microcode could easily be changed, whereas hard wired instructions could not. This greatly facilitated CPU design and led to more complex instruction sets.

Another advantage of microcode was the implementation of more complex machine instructions. In the 1960s through the late 1970s, much programming was done in assembly language, a symbolic equivalent of machine instructions. The more abstract and higher level the machine instruction, the greater the programmer productivity. The ultimate extension of this were "Directly Executable High Level Language" designs. In these each statement of a high level language such as PL/1 would be entirely and directly executed by microcode, without compilation. The IBM Future Systems project and Data General Fountainhead Processor were examples of this. Those systems were never produced, but elements of the IBM project were implemented in the System/38 and AS/400, which used extensive microprogramming to implement high level constructs. For example the System/38 and AS/400 could perform a relational SQL join in a machine instruction.

Microprogramming also helped alleviate the memory bandwidth problem. During the 1970s, CPU speeds grew more quickly than memory speeds. Numerous acceleration techniques such as memory block transfer, memory pre-fetch and multi-level caches helped reduce this. However high level machine instructions (made possible by microcode) helped further. Fewer more complex machine instructions require less memory bandwidth. For example complete operations on character strings could be done as a single machine instruction, thus avoiding multiple instruction fetches.

Architectures using this approach included the IBM System/360 and DEC VAX family used complex microprograms. The IBM System/38 and AS/400 took this concept even further. The approach of using increasingly complex microcode-implemented instruction sets was later called CISC.

[edit]

Other benefits

A processor's microprograms operate on a more primitive, totally different and much more hardware-oriented architecture than the assembly instructions visible to normal programmers. In coordination with the hardware, the microcode implements the programmer-visible architecture. The underlying hardware need not have a fixed relationship to the visible architecture. This makes it possible to implement a given instruction set architecture on a wide variety of underlying hardware micro-architectures.

Doing so is important if binary program compatibility is a priority. That way previously existing programs can run on totally new hardware without requiring revision and recompilation. However there may be a performance penalty for this approach. The tradeoffs between application backward compatibility vs CPU performance are hotly debated by CPU design engineers.

The IBM System/360 has a 32-bit architecture with 16 general-purpose registers, but most of the System/360 implementations actually used hardware implementing a much simpler underlying microarchitecture. The 360 Model 30, the slowest model in the line, used an 8-bit microarchitecture with only a few hardware registers; everything that the programmer saw was emulated by the microprogram. Other, faster models used 16-bit or 32-bit underlying microarchitectures that more-closely resembled the programmer-visible architecture; this allowed much faster execution speeds.

In this way, microprogramming enabled IBM to design many System/360 models with substantially different hardware and spanning a wide range of cost and performance, while making them all architecturally compatible. This dramatically reduced the amount of unique system software that had to be written for each model.

A similar approach was used by Digital Equipment Corporation in their VAX family of computers. Initially a 32-bit TTL processor in conjunction with supporting microcode implemented the programmer-visible architecture. Later VAX versions used different microarchitectures, yet the programmer-visible architecture didn't change.

Microprogramming also reduced the cost of field changes to correct defects (bugs) in the processor; a bug could often be fixed by replacing a portion of the microprogram rather than by changes being made to hardware logic and wiring.

[edit]

History

Before 1951, the control logic for central processing units was designed by ad hoc methods. One of the simplest was to use rings of flip-flops to sequence the computer's control logic.

In 1951 Maurice Wilkes had a fundamental insight. He realized that if one takes the control signals for a computer, one could understand them as being played much like a player piano roll. That is, they are controlled by a sequence of very wide words constructed of bits.

[edit]

Implementation

A microprogram provides the bits to control these. The fundamental advance is that CPU control becomes a specialized form of a computer program. It thus transforms a complex electronic design challenge (the control of a CPU) into a less-complex programming challenge.

To take advantage of this, computers were divided into several parts:

A microsequencer picked the next word of the control store. A sequencer is mostly a counter, but usually also has some way to jump to a different part of the control store depending on some data, usually data from the instruction register and always some part of the control store. The simplest sequencer is just a register loaded from a few bits of the control store.

A register set is a fast memory containing the data of the central processing unit. It may include the program counter, stack pointer, and other numbers that are not easily accessible to the application programmer. Often the register set is triple-ported, that is, two registers can be read, and a third written at the same time.

An arithmetic and logic unit performs calculations, usually addition, logical negation, a right shift, and logical AND. It often performs other functions, as well.

There may also be a memory address register and a memory data register, used to access the main computer storage.

Together, these elements form an "execution unit." Most modern CPUs have several execution units. Even simple computers usually have one unit to read and write memory, and another to execute user code.

These elements could often be bought together as a single chip. This chip came in a fixed width which would form a 'slice' through the execution unit. These were known a 'bit slice' chips.

The parts of the execution units, and the execution units themselves are interconnected by a bundle of wires called a bus.

Programmers develop microprograms. The basic tools are software: A microassembler allows a programmer to define the table of bits symbolically. A simulator program executes the bits in the same way as the electronics (hopefully), and allows much more freedom to debug the microprogram.

A typical micromachine's control word has a field, a range of bits, to control each piece of electronics in the CPU. For example, one simple arrangement might be:

For this type of micromachine to implement a jump instruction with the address following the jump op-code, the microassembly would look something like:

# Any line starting with a number-sign is a comment
# This is just a label, the ordinary way assemblers symbolically represent a
# memory address.
InstructionJUMP:
# To prepare for the next instruction, the instruction-decode microcode has already
# moved the program counter to the memory address register. This instruction fetches
# the target address of the jump instruction from the memory word following the
# jump opcode, by copying from the memory data register to the memory address register.
# This gives the memory system two clock ticks to fetch the next
# instruction to the memory data register for use by the instruction decode.
# The sequencer instruction "next" means just add 1 to the control word address.
MDR, NONE, MAR, COPY, NEXT, NONE
# This places the address of the next instruction into the PC.
# This gives the memory system a clock tick to finish the fetch started on the
# previous microinstruction.
# The sequencer instruction is to jump to the start of the instruction decode.
MAR, 1, PC, ADD, JMP, InstructionDecode
# The instruction decode is not shown, because it's usually a mess, very particular
# to the exact processor being emulated. Even this example is simplified.
# Many CPUs have several ways to calculate the address, rather than just fetching
# it from the word following the op-code. Therefore, rather than just one
# jump instruction, those CPUs have a family of related jump instructions.

The above is an example of "horizontal" microcode. This is microcode that sets all the bits of the CPU's controls on each tick of the clock that drives the sequencer.

Note how many of the bits in horizontal microcode contain fields to do nothing. Some CPUs use a completely different design called "vertical" microcode to reduce cost. Some vertical microcodes are just the assembly language of a simple conventional computer that is emulating a more complex computer. This technique was popular in the time of the PDP-8. Another form of vertical microcode has two fields:

| field select | field value |

The "field select" selects which part of the CPU will be controlled by this word of the control store. The "field value" actually controls that part of the CPU. With this type of microcode, a designer explicitly chooses to make a slower CPU to save money by reducing the unused bits in the control store; however, the reduced complexity may increase the CPU's clock frequency, which lessens the effect of an increased number of cycles per instruction.

Because transistors are becoming cheaper, horizontal microcode is coming to dominate the design of hardware control units. As of the early 2000s, CPUs no longer use vertical microcode except perhaps in emulator software designed to run on a standard computer.

After the microprogram is finalized, and extensively tested, it is sometimes used as the input to a computer program that constructs logic to produce the same data. This program is similar to those used to optimize a programmable logic array. No known computer program can produce optimal logic, but even pretty good logic can vastly reduce the number of transistors from the number required for a ROM control store. This reduces the cost and power used by a CPU.

[edit]

Writable control stores

A few computers were built using "writable microcode" -- rather that storing the microcode in ROM or hard-wired logic, the microcode was stored in a RAM called a Writable Control Store or WCS. Many of these machines were experimental laboratory prototypes, but there were also commercial machines that used writable microcode, such as early Xerox workstations, the DEC VAX 8800 ("Nautilus") family, and a number of IBM System/370 implementations. Many more machines offered user-programmable writeable control stores as an option (including the HP 2100 and DEC PDP-11/60 minicomputers). WCS offered several advantages including the ease of patching the microprogram and, for certain hardware generations, faster access than ROMs could provide. User-programmable WCS allowed the user to optimize the machine for specific purposes.

A CPU that uses microcode generally takes several clock cycles to execute a single instruction, one clock cycle for each step in the microprogram for that instruction. Some CISC processors include instructions that can take a very long time to execute. Such variations in instruction length interfere with pipelining and interrupt latency.

[edit]

Microcode versus VLIW and RISC

The design trend toward heavily microcoded processors with complex instructions began in the early 1960s and continued until roughly the mid-1980s. At that point the RISC design philosophy started becoming more prominent. This included the points:

Analysis shows complex instructions are rarely used, hence the machine resources devoted to them are largely wasted.
Programming has largely moved away from assembly level, so it's no longer worthwhile to provide complex instructions for productivity reasons.
The machine resources devoted to rarely-used complex instructions is better used for expediting performance of simpler, commonly-used instructions.
Complex microcoded instructions requiring many, varying clock cycles are difficult to pipeline for increased performance.
Simpler instruction sets allow direct execution by hardware, avoiding the performance penalty of microcoded execution.

Many RISC and VLIW processors are designed to execute every instruction (as long as it is in the cache) in a single cycle. This is very similar to the way CPUs with microcode execute one microinstruction per cycle. VLIW processors have instructions that behave like very wide horizontal microcode, although typically VLIW instructions do not have as fine-grained control over hardware as microcode. RISC processors can have instructions that look like narrow vertical microcode.

[edit]