Unsigned Integer Coding

Our decimal representation of numbers is an example of a more general radix notation in which a number is represented as a sum of small multiples of powers of a base or radix integer. The multipliers are represented by symbols called digits. For decimal numbers the base or radix is 10 and the digits represent values from 0 to 9.

The ability to use different bases is important in computer science because computers do not work efficiently with decimal numbers. Computers circuits are better at working with data that has two possible values, so they do arithmetic with base 2 numbers, which are also called binary numbers. For reasons that will become clear later, using any base that is a power of 2 gives us a representation for numbers that is closely related to binary. Because binary notation is so bulky and hard to read, most computer scientists prefer base 16 (hexadecimal or hex), or occasionally, base 8 (octal). For hexadecimal numbers the digits symbols need to represent values from 0 to 15.

Since computer professionals often need to work with different bases, they need the ability to convert numbers from one base to another. Computers also need subprograms for converting between bases in order to deal with the decimal numbers that users expect for input and output.

Although radix notation can represent arbitrarily large numbers, for practical reasons computers perform arithmetic operations using a fixed length representation. This imposes a limitation on the size of the numbers that can be represented.

The numbers that we deal with daily are usually written in decimal notation. In decimal notation, the value of a number is a sum of small multiples of powers of 10. There is nothing special about the number 10 except that we have 10 fingers. Numbers can be expressed in a similar form using powers of any integer (called the base or radix) as long as it is greater than 1.

In base 10 the decimal number 5703 is expressed as a sum of small integers times powers of 10. For example,

5703 = 5*10³ + 7*10² + 0*10¹ + 3*10⁰.

The multipliers (5, 7, 0, and 3 in the example) are called base 10 digits, or just digits if there are no other bases being considered. They must be nonnegative integers that are less than 10.

In base 4 the decimal number 75 can be expressed as a sum

75 = 1*4³ + 0*4² + 2*4¹ + 3*4⁰.

Here, the multipliers (1, 0, 2, and 3) are called the base 4 digits of the decimal number 75. They must be nonnegative integers that are less than 4. If you worked with base 4 on a daily basis, you would want to shorten the notation for numbers, so that decimal 75 is written as 1023. Then 1023 is the base 4 (or radix 4) notation for the decimal number 75.

When you are working with different bases at the same time a sequence of digits, by itself, is ambiguous - it is not clear what base you are using. To avoid the ambiguity, the bases can be indicated with a subscript on the number. Then you can make sense of equating numbers with different bases. For example, we write 75₁₀ = 1023₄.

To represent numbers in base or radix r you need symbols for each digit value from 0 to r - 1. If r is less than or equal to 10 you can just use the first r decimal digits. If r is greater than 10 you can use alphabetic characters as symbols for digit values greater than 9. For example, for hexadecimal we use the following symbols.

hex	dec	hex	dec
0	0	8	8
1	1	9	9
2	2	A	10
3	3	B	11
4	4	C	12
5	5	D	13
6	6	E	14
7	7	F	15

For any base r>1, a nonnegative integer n can be written in the form

n = d_k*r^k + d_k-1*r^k-1 + ... + d₀*r⁰, with each d_i satisfying 0 <= d_i < r - 1.

Factoring out r from all of the terms on the right except the last, we get

n = r*(d_k*r^k-1 + d_k-1*r^k-2 + ... + d₁*r⁰) + d₀.

This shows that if we divide n by r we get d₀, the last digit of n in base r, as a remainder. Furthermore, the quotient for the division is

d_k*r^k-1 + d_k-1*r^k-2 + ... + d₁*r⁰.

Thus we can divide again by r to get another base r digit, d₁, as a remainder, and a new quotient. We can get all of the base r digits of n by repeating the division process as described in the division algorithm.

The following repeated division algorithm produces the base r digits of n from right to left.

    do
      divide n by r
      replace n by the quotient
      the remainder is the next base r digit
    while n > 0

Since you do arithmetic in you favorite base, the remainders will be numbers in that base. You may need to convert them to base r digits. For example, if your favorite base is ten and you are converting to hexadecimal then you need to convert remainder values for 0 to 15 to hexadecimal digits. To do this, you can use the table of hexadecimal digits shown in the "Digit Symbols" section.

The repeated division algorithm works no matter what your favorite base is. For humans, the favorite base is usually ten, so humans can use this algorithm to convert from decimal to base r.

Since, computers do arithmetic in binary, they use this algorithm to convert numbers to strings of decimal characters. This conversion requires converting each remainder to a character. If digits characters are consecutive and in order in the character coding (this is true in ASCII) then the conversion can be obtained by adding the character '0' to each remainder.

In this algorithm, the base r digits are taken from left to right.

    set n to 0
    while there are more base r digits
      multiply n by r
      add the next base r digit to n
    end while

The answer n appears in whatever base you use to do arithmetic. As in the repeated division algorithm, you may need to convert base r digits to your favorite base.

In this example we convert DE3₁₆ to decimal.

Or, in short,

    16 × 0 + 13 = 13

    16 × 13 + 14 = 222

    16 × 222 + 3 = 3555

Comparing the arithmetic in this example with the arithmetic in the repeated division algorithm example illustrates how the repeated multiplication algorithm reverses the steps of the repeated division algorithm.

Note that when multiplying by 16, the 16 is placed on top. If you do this, the partial sums will be 16 times a decimal digit (0 to 9). After doing several examples, you will start remembering these multiples. They are also useful for the repeated division algorithm.

The repeated multiplication algorithm works no matter what your favorite base is. For humans, the favorite base is usually ten, so humans can use this algorithm to convert from base r to decimal.

Since, computers do arithmetic in binary, they use this algorithm to convert strings of decimal characters to numbers. This conversion requires converting each decimal character to a binary numeric value. If digits characters are consecutive and in order in the character coding (this is true in ASCII) then the conversion can be obtained by subtracting the character '0' from each digit character.

`d`	`d`×16
0	0
1	16
2	32
3	48
4	64
5	80
6	96
7	112
8	128
9	144

When one base is a power of another, conversions between the two are a simple matter of grouping or ungrouping the digits. The grouping and ungrouping techniques are recommended for conversions between decimal and binary.

For hexadecimal and octal, the grouping and ungrouping is based on the following tables which show how an octal or hexadecimal digit converts to 3 or 4 binary digits. The tables can also be used for conversions from 3 or 4 binary digits to octal or hexadecimal digits.

oct	bin	hex	bin	hex	bin
0	000	0	0000	8	1000
1	001	1	0001	9	1001
2	010	2	0010	A	1010
3	011	3	0011	B	1011
4	100	4	0100	C	1100
5	101	5	0101	D	1101
6	110	6	0110	E	1110
7	111	7	0111	F	1111

Using the fact that 8 = 2³ and 16 = 2⁴, we can convert binary numbers to octal or hexadecimal by forming groups of 3 (for octal) or 4 (for hexadecimal) bits, starting with the rightmost bit. Then convert each group of 3 or 4 bits to an octal or hexadecimal digit. For example,

10110110100011₂ = 10 1101 1010 0011 = 2DA3₁₆,

and

10110110100011₂ = 10 110 110 100 011 = 26643₈.

To convert from octal or hexadecimal to binary, you expand each octal digit to 3 bits, or each hexadecimal digit to 4 bits. The you can remove the groupings. You can also drop leading 0 bits. For example,

3A4F₁₆ = 0011 1010 0100 1111 = 11101001001111₂,

and

64702₈ = 110 100 111 000 010 = 110100111000010₂.

Computers generally use fixed-length representations for numbers. For example, in C and C++, there are three types for unsigned integers: unsigned int (usually 32 bits), unsigned short (usually 16 bits), and unsigned char (8 bits). Recent compilers also support unsigned long long (64 bits).

At times, it is necessary to coerce an unsigned integer from one type to another, resulting in a change in the number of bits used to represent the number. When an unsigned integer is coerced from a larger length to a smaller length, high order (leftmost) bits are discarded. If any of the discarded bits are 1 then the coerced value is incorrect. Compilers will often warn you of this possibility.

When an unsigned integer is coerced from a smaller length to a larger length the original bits are padded on the left with 0 bits. This is called zero extension.