A Comparison of Matrix Multipication Code
In this web page, a RISC machine (MIPS) and a CISC machine (VAX-11)
are compared with regard to assembly language code for matrix multipication.
Like many RISC machines, the MIPS has fixed length 32-bit instructions.
C version of matrix multipication code
The following C code will multiply two 100 x 100 matrices A and
B, placing the product in matrix C.
for (i = 0; i < 100; i++) {
for (j = 0; j < 100; j++) {
C[i, j] = 0.0;
for (k = 0; k < 100; k++) {
C[i, j] = C[i, j] + A[i, k]*B[k, j];
}
}
}
Comments on the assembly language versions
To keep the code for comparison small, only the innermost loop is coded in
assembly language.
For both versions, R8 or $f8 is used as a register copy of C[i, j].
When the innermost loop is entered, R8 or $f8 contains 0.0, R5 or $5
points to the first entry in row i of A, R6 or $6
points to the first entry in column j of B, and R7
or $7 contains 100.
In the innermost loop, the numbers that R5 ($5) and R6 ($6) point to are
multiplied and the product is added to R8 or $f8.
R5 ($5) is advanced to point to the next entry in row i of
A by adding 4.
R6 ($6) is advanced to point to the next entry in column j of
B by adding 400.
R7 ($7) is decremented and the loop is repeated if it remains positive.
R9 (VAX) or $f10 and $f12 (MIPS) are used as scratch registers.
After the innermost loop completes, the contents of R8 or $f8 need to be
moved to C[i, j].
MIPS assembly language code
loop3:
l.s $f10, 0($5) ; $f10 <- A[i, k]
l.s $f12, 0($6) ; $f12 <- B[k, j]
mul.s $f10, $f10, $f12 ; $f10 <- A[i, k]*B[k, j]
add.s $f8, $f8, $f10 ; $f8 <- $f8 + A[i, k]*B[k, j]
addi $5, $5, 4 ; increment pointer for A[i, k]
addi $6, $6, 400 ; increment pointer for B[k, j]
addi $7, $7, -1 ; decrement loop count
test:
bnez $7, loop3 ; Continue if loop count != 0
VAX-11 assembly language code
loop3:
MULF3 (R5)+, (R6), R9 ; R9 <- A[i, k]*B[k, j] and
; increment pointer for A[i, k]
ADDL2 #400, R6 ; increment pointer for B[k, j]
ADDF2 R9, R8 ; R8 <- R8 + A[i, k]*B[k, j]
SOBGTR R7, loop3 ; decrement loop count, continue if > 0
Questions
-
How many bytes of code are used in the MIPS version?
-
How many bytes of code are used in the VAX-11 version?
The opcode for each instruction requires 1 byte.
The operand specifier for each register, register indirect, or register
autoincrement operand requires 1 byte.
Branch displacements for most branch instructions, including SOBGTR,
require 1 byte.
The operand specifier for most immediate operands requires 1 byte plus the
size of the data as specified by the instruction (all operands of an ADDL3
instruction are 4 bytes).
-
Classify the potential data hazards in the MIPS code.
-
How many machine cycles are required for one iteration of the loop
in the MIPS version?
Make the dreamworld assumption that all ALU operations can be
completed in a single cycle.
-
How many machine cycles are required for one iteration of the loop
in the VAX-11 version?
This requires some guessing as to how pipelining works on the VAX.
However, you can get a reasonable lower bound on the number of cycles
required by assuming that the VAX can do any number of things in parallel,
subject only to the restriction that operand specifiers must be decoded
sequentially.
Due to the fact that the VAX uses variable length operand specifier coding,
it cannot begin decoding a specifier until it has determined the length of
the preceding specifier.
Page URL: http://www.d.umn.edu/~gshute/arch/mmult-mips.html
Page Author: Gary Shute
Last Modified: Tuesday, 26-Jun-2007 12:28:59 CDT
Comments to: gshute@d.umn.edu