Magicians do their magic with smoke and mirrors. Computer architects do their magic with multiplexers.

A large part of ALU design is captured by the design of a 1-bit ALU. Then 1-bit ALUs can be combined to form a multibit ALU with a small amount of additional circuitry.

The diagram to the left captures the basic idea for a 1 bit ALU: produce the different results that you want in parallel then select the one you need with a multiplexer.

Each of the **OP i** boxes computes 1 bit for an
operation.
For example,

- A full adder computes one bit of an addition.
- An AND gate computes one bit of a bitwise AND operation.
- An OR gate computes one bit of a bitwise OR operation.

The multiplexer selects the output of the appropriate 1-bit operation.
Its control input `Op`

is derived from operation and
function bits of a machine instruction.

The diagram to the left shows a 1-BIT ALU with multiple functions.
When the **YInv** control signal is 0 it performs an addition or a
bitwise XOR, AND, or OR operation, depending on the value of the
**OP** control input.

When the **YInv** signal is 1 it performs a subtraction or a
bitwise XOR, AND, or OR operation with inverted Y.

Additional circuitry is needed for generating the carry inputs
(**Ci**) for addition and subtraction and detecting errors due to
limited word size.

A small amount of additional circuitry can also be added to the high-order bit for error detection.

For handling carries a high-performance ALU needs to use carry lookahead. The diagram below shows how carry lookahead units are added to generate carries for a 16-bit ALU. Inputs, outputs, and control signals are omitted for clarity.

This circuitry can also generate comparison outputs when the
**YInv** ALU control signal is 1.
The **G** output of the root carry lookahead unit (the bottom one
above) indicates when `X > Y`.
The **P** output indicates when `X = Y`.

A small amount of added circuitry in the high-order 1-bit ALU is needed to detect incorrect results due to limited word size.

The basic idea for ALU design presented here is not suitable for complex operations such as multiplication, division, and floating-point operations. In modern processors these operations need to be split into multiple cycles to avoid long cycle times for all operations.

Dealing with multicycle operations requires some significant changes to the processor architecture. This is especially true with pipelining. The Register Renaming web page describes the organization used by modern processors.