- ...functions
- Currently, the only tested platform that does not
use ELF is the Convex Exemplar with HP-UX.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...scheduling
- An option for in-order scheduling is provided as a straightforward modification to the out-of-order scheduling pipeline, but is not well tested. Details of the implementation of this
feature are provided in Section 10.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...size
- Code for fewer renaming registers
and consequent register stalls is included but has
not been tested and is not exposed to the user.
Chapter 10 gives a more detailed explanation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...HREF="node25.html#rpipes_mem">3.2.3
- A
command-line option is also provided to specify that non-memory instructions
should also be dispatched to an issue window; by default, these instructions
are issued directly from the active list. More details about this option
are provided in Section 4.1.1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...renaming
- Single-precision floating-point operations experience
WAW (output) dependences because floating-point registers are mapped and
renamed on double-precision boundaries. This is further discussed in Chapter 10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...memory
- RSIM also does not include the ability to mark certain
regions of memory uncached, a feature commonly associated with
virtual memory
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...below
- We do not expect applications to use this type of MEMBAR.
It is currently used in RSIM only in the RSIM system trap library.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...performed
- A
store is globally performed when its value is visible to all processors; i.e.
all other caches with a copy of the line have been invalidated. In RSIM, this is indicated when an acknowledgment for the store is received by the processor.
A load is globally performed when its return value is bound and when the store whose value it returns is globally performed. In RSIM, this is detected when the load returns its value to the processor.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...exceptions
- We use the terms
exception and trap interchangably
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...boundary
- The SPARC architecture also allows word-alignment for
quadruple-precision floating-point loads and stores, but RSIM does
not support such instructions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...subtraction)
- RSIM
does not yet raise any exception on some unsupported instructions, such as
64-bit integer operations or quadruple-precision
floating-point accesses. It is the user's responsibility to insure that
such instructions are not used. The compiler options we provide in
Section 2.4 inform the compiler not to generate these
instructions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...network
- The potential
for supporting other network models is discussed in Section 15.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...filename
- SpecialInitOutput
is simply an fopen followed by an fwrite.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...hierarchy
- Note that in this chapter, the terms issue and complete usually refer to issuing to the memory hierarchy and completion at the memory hierarchy. These are different from the issue and completion stages of the processor pipeline.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...length
- For some
operations, the minimum alignment requirement specified in the ISA is
smaller than the actual length of data transferred. However, we
simulate a processor that traps and emulates instructions that are not
aligned on a boundary equal to their length, as these seem more
appropriate for high-performance implementation. That is, the
possibility of having multiple cache line accesses and multiple page
faults for a single instruction seems to be an undesirably difficult
problem.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...barriers
- Because of the
single-precision floating point problems discussed in
Chapter 10, single-precision loads do not issue until
their output dependences are resolved. With static scheduling, subsequent
loads will also be prevented from issuing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...message
- In the code, the data structure allocated for
such a message is called the REQ data structure. This data
structure is used for requests and reply messages, for both data and
coherence transactions. We avoid using the term REQ in this
manual to avoid any possible confusion with the REQUEST message
type.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...READ_SH
- If no other processors
sharing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...UPGRADE
- In certain
race conditions, UPGRADE is converted to READ_OWN, as described in Section 14.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...UPGRADE
- In certain
race conditions, UPGRADE is converted to READ_OWN, as described in Section 14.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...request
- In the L2 cache, this
will return NOMSHR_STALL_COHE if there is currently a
COHE request pending on the line. This indicates that no MSHR has
been consumed, but this REQUEST must wait for the pending
COHE first.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...stall
- WAR stalls
are not usually seen with straightforward implementations of
consistency models, as stores following loads to the same line
generally depend on the values of those loads. Consequently, such
stores cannot issue in straightforward implementations until the loads
complete.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...used
- Recall that the simulator code refers
to the MESI states as PR_DY, PR_CL, SH_CL, and
INVALID, respectively.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...studies [21]
- Note that our architecture does permit
``forwarding'' of values in the processor memory unit.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.