Bound on the MAL:

Lower bound of MAL = the maximum number of checkmarks in any row of the reservation table. Upper bound of MAL= the number of 1's in the initial collision vector plus 1.

Problem 6.6

(a) Forbidden latencies: 1,2, and 5. Initial collision vector=(10011)

(b) State transition diagram.

(d) Throughput=1/3=16.67 million operations per second (MOPS)

(e) Lower bound of MAL=2. The optimal latency is not achieved.

Problem 6.7:

(a) Modified reservation table

	1	2	3	4	5	6	7
S1	X						X
S2		X
S3			X
S4				X		X
D					X

(b) State transition diagram:

(c) Simple cycles: (4), (5), (7), (3,1), (3,4), (3,5,4), (3,5,7), (1,7), (5,4), (5,7), (3,7), (1,3,4), (1,3,5,4), (1,3,5,7), (1,3,7), (1,4,3), (1,4,4), (1,4,7), (5,3,4), (5,3,7), (5,3,1,7).

Greedy Cycle: (1,3)

(d) MAL=(1+3)/2=2

(e) Throughput= 1/2.

Problem 6.8:

(a)

*Cycle 1: Compute A(1) + 0. Feed A(1) to X and 0 to Y. Connect X and Y to the inputs of the adder.

*Cycle 2: Compute A(2)+0. Feed A(2) to X and 0 to Y.

*Cycle 3: Compute A(3)+0. Feed A(3) to X and 0 to Y.

*Cycle 4: Compute A(4)+0. Feed A(4) to X and 0 to Y.

*Cycle 5: Compute A(1)+A(5). Switch the lower switch to feed Z to the lower input of S1 from now on , and feed A(5) to the upper input.

*Cycle 6: Compute A(2)+A(6). Feed A(6) to the upper input of S1.

*Cycle 7: Compute A(3)+A(7). Feed A(7) to the upper input of S1.

*Cycle 8: Compute A(4)+A(8). Feed A(8) to the upper input of S1.

*Cycle 9: Compute A(1)+A(5)+A(9). Feed A(9) to the upper input of S1.

*Cycle 10: Compute A(2)+A(6)+A(10). Feed A(10) to the upper input of S1.

*Cycle 11: Compute A(3)+A(7)+A(11). Feed A(11) to the upper input of S1.

*Cycle 12: Compute A(4)+A(8)+A(12). Feed A(12) to the upper input of S1.

........

*Cycle N-1: Compute A(3)+A(7)+A(11)+...+A(N-1). Feed A(N-1) to the upper input of S1.

*Cycle N: Compute A(4)+A(8)+A(12)+...+A(N). Feed A(N) to the upper input of S1.

*Cycle N+1: Store Z (=A(1)+A(5)+....+A(N-3)) to R and switch the upper switch to input R to the upper input of S1 from now on.

*Cycle N+2: Compute A(1)+A(5)+A(9)+...+A(N-3)+A(2)+A(6)+A(10)+...+A(N-2).

*Cycle N+3: Store Z (=A(3)+A(7)+....+A(N-1)) to R.

*Cycle N+4: Compute A(3)+A(7)+A(11)+...+A(N-1)+A(4)+A(8)+A(12)+...+A(N).

*Cycle N+5:

*Cycle N+6: Store Z (=A(1)+A(2)+A(5)+A(6)+....+A(N-2)+A(N-1)) to R.

*Cycle N+7:

*Cycle N+8: Compute A(1)+A(5)+A(9)+...+A(N-3)+A(2)+A(6)+A(10)+...+A(N-2)+A(3)+A(7)+A(11)+...+A(N-1)+A(4)+A(8)+A(12)+...+A(N).

....

*Cycle N+12: Result out put from Z which is sum of all elements of A.

(b) The N values are fed sequentially to a nonpipelined adder. Therefore, Nk cycles are needed. The speed up is

For N=64 and k=4

Among the N+11 cycles, 8 cycles (N+1,N+3,N+5,N+6,N+7,N+9,N+10,N+11) issue useless instructions. Therefore, N+3 useful add instructions are performed. The efficiency is