A Comparison of Matrix Multipication Code

In this web page, a RISC machine (MIPS) and a CISC machine (VAX-11) are compared with regard to assembly language code for matrix multipication. Like many RISC machines, the MIPS has fixed length 32-bit instructions.

C version of matrix multipication code

The following C code will multiply two 100 x 100 matrices A and B, placing the product in matrix C.
    for (i = 0; i < 100; i++) {
        for (j = 0; j < 100; j++) {
            C[i, j] = 0.0;
            for (k = 0; k < 100; k++) {
                C[i, j] = C[i, j] + A[i, k]*B[k, j];
            }
        }
    }

Comments on the assembly language versions

To keep the code for comparison small, only the innermost loop is coded in assembly language. For both versions, R8 or $f8 is used as a register copy of C[i, j]. When the innermost loop is entered, R8 or $f8 contains 0.0, R5 or $5 points to the first entry in row i of A, R6 or $6 points to the first entry in column j of B, and R7 or $7 contains 100. In the innermost loop, the numbers that R5 ($5) and R6 ($6) point to are multiplied and the product is added to R8 or $f8. R5 ($5) is advanced to point to the next entry in row i of A by adding 4. R6 ($6) is advanced to point to the next entry in column j of B by adding 400. R7 ($7) is decremented and the loop is repeated if it remains positive. R9 (VAX) or $f10 and $f12 (MIPS) are used as scratch registers. After the innermost loop completes, the contents of R8 or $f8 need to be moved to C[i, j].

MIPS assembly language code

    loop3:
            l.s     $f10, 0($5)      ; $f10 <- A[i, k]
            l.s     $f12, 0($6)      ; $f12 <- B[k, j]
            mul.s   $f10, $f10, $f12 ; $f10 <- A[i, k]*B[k, j]
            add.s   $f8, $f8, $f10   ; $f8 <- $f8 + A[i, k]*B[k, j]
            addi    $5, $5, 4        ; increment pointer for A[i, k]
            addi    $6, $6, 400      ; increment pointer for B[k, j]
            addi    $7, $7, -1       ; decrement loop count
    test:
            bnez    $7, loop3        ; Continue if loop count != 0

VAX-11 assembly language code

    loop3:
            MULF3   (R5)+, (R6), R9  ; R9 <- A[i, k]*B[k, j] and
                                     ; increment pointer for A[i, k]
            ADDL2   #400, R6         ; increment pointer for B[k, j]
            ADDF2   R9, R8           ; R8 <- R8 + A[i, k]*B[k, j]
            SOBGTR  R7, loop3        ; decrement loop count, continue if > 0

Questions

  1. How many bytes of code are used in the MIPS version?
  2. How many bytes of code are used in the VAX-11 version? The opcode for each instruction requires 1 byte. The operand specifier for each register, register indirect, or register autoincrement operand requires 1 byte. Branch displacements for most branch instructions, including SOBGTR, require 1 byte. The operand specifier for most immediate operands requires 1 byte plus the size of the data as specified by the instruction (all operands of an ADDL3 instruction are 4 bytes).
  3. Classify the potential data hazards in the MIPS code.
  4. How many machine cycles are required for one iteration of the loop in the MIPS version? Make the dreamworld assumption that all ALU operations can be completed in a single cycle.
  5. How many machine cycles are required for one iteration of the loop in the VAX-11 version? This requires some guessing as to how pipelining works on the VAX. However, you can get a reasonable lower bound on the number of cycles required by assuming that the VAX can do any number of things in parallel, subject only to the restriction that operand specifiers must be decoded sequentially. Due to the fact that the VAX uses variable length operand specifier coding, it cannot begin decoding a specifier until it has determined the length of the preceding specifier.

Page URL: http://www.d.umn.edu/~gshute/arch/mmult-mips.html
Page Author: Gary Shute
Last Modified: Tuesday, 26-Jun-2007 12:28:59 CDT
Comments to: gshute@d.umn.edu