next up previous contents
Next: Processor statistics Up: Statistics collection Previous: Statistics collection

Overall performance statistics

 

RSIM displays the total execution time and the IPC (instructions per cycle) achieved by the program on the system simulated. In order to better characterize the bottlenecks in application performance, the total execution time is further categorized into busy time and stalls due to various classes of instructions. These classes of instructions include ALU, FPU, data reads, data writes, exceptions, branches, synchronization, and up to 9 user-defined aggregate classes discussed in Section 5.4. Data read and write stalls are further split according to the level of the memory hierarchy at which the memory operations were resolved: L1 cache, L2 cache, local memory, or remote memory.

With ILP processors, the various components of execution time described above are not easily separable, as multiple instructions can execute in parallel and out of order on such systems. We use the following policy, also used in other studies (e.g. [14, 15, 18]) to attribute execution time to the various components. If, in any given cycle, the processor retires the maximum allowable number of instructions, we count that cycle as part of busy time. Otherwise, we charge that cycle to the stall time component corresponding to the first instruction that could not be retired. Thus, the stall time for a class of instructions represents the number of cycles that instructions of that class spend at the head of the active list before retiring.



Vijay Sadananda Pai
Thu Aug 7 14:18:56 CDT 1997