Bound on the MAL:
Lower bound of MAL = the maximum number of checkmarks in any row of the reservation table. Upper bound of MAL= the number of 1's in the initial collision vector plus 1.
Problem 6.6
(a) Forbidden latencies: 1,2, and 5. Initial collision vector=(10011)
(b) State transition diagram.
(c) MAL=3
(d) Throughput=1/3=16.67 million operations per second (MOPS)
(e) Lower bound of MAL=2. The optimal latency is not achieved.
Problem 6.7:
(a) Modified reservation table
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
S1 | X | X | |||||
S2 | X | ||||||
S3 | X | ||||||
S4 | X | X | |||||
D | X |
(b) State transition diagram:
(c) Simple cycles: (4), (5), (7), (3,1), (3,4), (3,5,4), (3,5,7), (1,7), (5,4), (5,7), (3,7), (1,3,4), (1,3,5,4), (1,3,5,7), (1,3,7), (1,4,3), (1,4,4), (1,4,7), (5,3,4), (5,3,7), (5,3,1,7).
Greedy Cycle: (1,3)
(d) MAL=(1+3)/2=2
(e) Throughput= 1/2.
Problem 6.8:
(a)
*Cycle 1: Compute A(1) + 0. Feed A(1) to X and 0 to Y. Connect X and Y to the inputs of the adder.
*Cycle 2: Compute A(2)+0. Feed A(2) to X and 0 to Y.
*Cycle 3: Compute A(3)+0. Feed A(3) to X and 0 to Y.
*Cycle 4: Compute A(4)+0. Feed A(4) to X and 0 to Y.
*Cycle 5: Compute A(1)+A(5). Switch the lower switch to feed Z to the lower input of S1 from now on , and feed A(5) to the upper input.
*Cycle 6: Compute A(2)+A(6). Feed A(6) to the upper input of S1.
*Cycle 7: Compute A(3)+A(7). Feed A(7) to the upper input of S1.
*Cycle 8: Compute A(4)+A(8). Feed A(8) to the upper input of S1.
*Cycle 9: Compute A(1)+A(5)+A(9). Feed A(9) to the upper input of S1.
*Cycle 10: Compute A(2)+A(6)+A(10). Feed A(10) to the upper input of S1.
*Cycle 11: Compute A(3)+A(7)+A(11). Feed A(11) to the upper input of S1.
*Cycle 12: Compute A(4)+A(8)+A(12). Feed A(12) to the upper input of S1.
........
*Cycle N-1: Compute A(3)+A(7)+A(11)+...+A(N-1). Feed A(N-1) to the upper input of S1.
*Cycle N: Compute A(4)+A(8)+A(12)+...+A(N). Feed A(N) to the upper input of S1.
*Cycle N+1: Store Z (=A(1)+A(5)+....+A(N-3)) to R and switch the upper switch to input R to the upper input of S1 from now on.
*Cycle N+2: Compute A(1)+A(5)+A(9)+...+A(N-3)+A(2)+A(6)+A(10)+...+A(N-2).
*Cycle N+3: Store Z (=A(3)+A(7)+....+A(N-1)) to R.
*Cycle N+4: Compute A(3)+A(7)+A(11)+...+A(N-1)+A(4)+A(8)+A(12)+...+A(N).
*Cycle N+5:
*Cycle N+6: Store Z (=A(1)+A(2)+A(5)+A(6)+....+A(N-2)+A(N-1)) to R.
*Cycle N+7:
*Cycle N+8: Compute A(1)+A(5)+A(9)+...+A(N-3)+A(2)+A(6)+A(10)+...+A(N-2)+A(3)+A(7)+A(11)+...+A(N-1)+A(4)+A(8)+A(12)+...+A(N).
....
*Cycle N+12: Result out put from Z which is sum of all elements of A.
(b) The N values are fed sequentially to a nonpipelined adder. Therefore, Nk cycles are needed. The speed up is
For N=64 and k=4
Among the N+11 cycles, 8 cycles (N+1,N+3,N+5,N+6,N+7,N+9,N+10,N+11) issue useless instructions. Therefore, N+3 useful add instructions are performed. The efficiency is
(c)
(d)
Problem 6.9
(a) Forbidden latency: 3; collision vector=(100).
(b) State transition diagram.
(c) Simple cycles: (2), (4), (1,4), (1,1,4), and (2,4); greedy cycles: (2) and (1,1,4)
(d) Optimal constant latency cycle: (2); MAL=2.
(e) Throughput=1/(2*20ns)=25 MOPS.
Problem 6.10
(a) Forbidden latencies: 3,4,5; collision vector=(11100).
(b) State transition diagram:
(c) Simple cycles: (1,1,6), (2,6), (6), and (1,6).
(d) Greedy cycles: (1,1,6)
(e) MAL=(1+1+6)/3=2.67.
(f) Minimum allowed constant cycle: (6)
(g) Maximum throughput =3/(8)
(h) 1/(6)