Magicians do their magic with smoke and mirrors. Computer architects do their magic with multiplexers.

A large part of ALU design is captured by the design of a 1-bit ALU. Then 1-bit ALUs can be combined to form a multibit ALU with a small amount of additional circuitry.

The diagram to the left captures the basic idea for a 1 bit ALU: produce the different results that you want in parallel then select the one you need with a multiplexer.

Operation

Each of the OPi boxes computes 1 bit for an operation. For example,

The multiplexer selects the output of the appropriate 1-bit operation. Its control input Op is derived from operation and function bits of a machine instruction.

The diagram to the left shows a 1-BIT ALU with multiple functions. When the YInv control signal is 0 it performs an addition or a bitwise XOR, AND, or OR operation, depending on the value of the OP control input.

When the YInv signal is 1 it performs a subtraction or a bitwise XOR, AND, or OR operation with inverted Y.

Additional circuitry is needed for generating the carry inputs (Ci) for addition and subtraction and detecting errors due to limited word size.

A small amount of additional circuitry can also be added to the high-order bit for error detection.

For handling carries a high-performance ALU needs to use carry lookahead. The diagram below shows how carry lookahead units are added to generate carries for a 16-bit ALU. Inputs, outputs, and control signals are omitted for clarity.

This circuitry can also generate comparison outputs when the YInv ALU control signal is 1. The G output of the root carry lookahead unit (the bottom one above) indicates when X > Y. The P output indicates when X = Y.

A small amount of added circuitry in the high-order 1-bit ALU is needed to detect incorrect results due to limited word size.

The basic idea for ALU design presented here is not suitable for complex operations such as multiplication, division, and floating-point operations. In modern processors these operations need to be split into multiple cycles to avoid long cycle times for all operations.

Dealing with multicycle operations requires some significant changes to the processor architecture. This is especially true with pipelining. The Register Renaming web page describes the organization used by modern processors.