Next: Processing L2 data array Up: Cache Hierarchy Previous: Handling COHE type

Processing L2 tag array accesses

Source files: src/MemSys/l2cache.c, src/MemSys/mshr.c, src/MemSys/cachehelp.c, src/MemSys/setup_cohe.c

Header files: incl/MemSys/cache.h, incl/MemSys/mshr.h

The function L2ProcessTagReq is called for accesses that have reached the head of an L2 tag array pipeline. L2ProcessTagReq is largely similar to L1ProcessTagReq but has some key differences, described below.

Difference 1: Presence of a data array

The first difference between L1ProcessTagReq and L2ProcessTagReq deals with the data array. REQUESTs that hit, the data response for REPLYs, as well as the copyback portions of COHE messages and write-backs (whether replacements at the L2 or unsolicited fills from the L1) all require data array accesses.

Difference 2: Servicing COHE messages

For COHE messages, the L2 cache marks the line in question with a ``pending-coherence'' bit and then forwards possible actions to the L1 cache first.

The actual actions for the message are processed at the time of the COHE_REPLY; however, NACK_PEND responses from the L1 cache are forwarded to the directory immediately for non-WRB messages. The pending-coherence bit is also cleared upon receiving the COHE_REPLY.

Additionally, the L2 cache is responsible for resolving cache-to-cache transfer requests. On a successful cache-to-cache transfer, the L2 cache not only sends a COHE_REPLY acknowledgment or copyback to the directory, but also sends a REPLY to the requesting processor with the desired data. The cache-to-cache transfer policy follows that depicted in Figure 3.3.

Difference 3: Additional conditions for stalling REQUEST messages

The notpres_mshr function adds two further conditions for stalling REQUEST messages in the case of the L2 cache.

REQUESTs that hit a line with its pending-coherence bit set receive a return value of NOMSHR_STALL_COHE from notpres_mshr. This indicates that no MSHR has been consumed, but that this REQUEST must wait for the pending COHE first. This case does not appear in the L1 cache because that cache does not have pending-coherence bits for lines.

If the REQUEST is not able to reserve a space in the write-back buffer for its expected REPLY, the notpres_mshr function returns NOMSHR_STALL_WRBBUF_FULL. This indicates to the function L2ProcessTagReq that no MSHR will be allocated for this REQUEST until space in the write-back buffer becomes available.

Difference 4: Handling retries

The L2 cache accepts REPLYs from the directory with the s.reply field set to RAR (request a retry). This indicates that the directory could not process the REQUEST and is returning it to avoid deadlock. In this case, the cache must reissue the original REQUEST. The cache uses the MSHR originally allocated by the REQUEST as a resource from which to reissue the REQUEST, thus allowing the retry message to be accepted even if the outbound REQUEST port is blocked.

Difference 5: Handling replacements caused by replies

The most significant differences between L1ProcessTagReq and L2ProcessTagReq deal with the handling of replacements. The L2 cache has a write-back buffer for victimization messages to the directory. Before a REQUEST can be sent out, the notpres_mshr function must ensure that there will be a write-back buffer space available for the reply.

When the REPLY returns to the cache, a write-back buffer is tentatively booked.

Difference 5a: Replies that replace no line or a shared line

If the REPLY causes no replacement, the write-back buffer space is freed. If the REPLY replaces a shared line, the cache sends a subset enforcement invalidation to the L1 cache, possibly using the write-back buffer entry as a resource from which to send the invalidation. This resource is added to the smart MSHR list simulator abstraction.

Difference 5b: Replies that replace an exclusive line

However, if the REPLY causes a replacement of an exclusive line, the write-back buffer space may actually be used for data as well.

If the L1 cache above is write-through, an invalidation message is sent up to the L1 cache. If it cannot be sent immediately, the cache uses the write-back buffer entry as a resource from which to send the message. After the invalidation message is sent up, the write-back or replacement message tries to issue from the write-back buffer entry to the port below it. Again, if this message cannot be sent immediately, the write-back buffer entry will be used as a resource to hold the message until it can be sent. In both cases, the write-back buffer entry is added to the smart MSHR list simulator abstraction.

If the L1 cache is write-back, the WRB first passes through the L2 data array if the line in L2 is held in dirty state, then to the L1 cache as a subset-enforcement COHE message. The next WRB coherence reply from the L1 is used to either replace the data currently held for the line in the write-back buffer (on a positive acknowledgment) or to inform the cache that the L1 cache did not have the desired data (on a NACK). A variety of races are handled in these cases, as the L1 may send an unsolicited write-back at nearly the same time as the L2 sends a request for a write-back. The details of these races are explained in the inline documentation with the code.

If neither the L2 nor the L1 had the line in dirty state, a REPL message, rather than a WRB is sent to the directory. The directory-bound write-back or replacement message uses its write-back buffer entry as a resource from which to send the message to insure that an inability to send out a WRB does not stall the REPLY that caused the replacement or the COHE_REPLY from the subset-enforcement WRB. Once the write-back or replacement message issues to the port below it, its space in the write-back buffer is cleared.

Next: Processing L2 data array Up: Cache Hierarchy Previous: Handling COHE type

Vijay Sadananda Pai
Thu Aug 7 14:18:56 CDT 1997