RSIM Bug Report #1
Version of RSIM | 1.0 |
Bug number | 1 |
Bug class | 2 |
Date | 9/15/97 |
Reported by | authors |
Affects | Configurations with L1 write-back caches |
Files | l2cache.c |
Problem Description
This bug can arise in configurations with both L1 and L2 write-back
caches. Specifically, if both caches try to replace the same line at
the same time, deadlock can occur. The cache already included code to
handle this race if the write-back from L1 was received at the L2
cache after the L2 cache sent its subset-enforcement message to the L1
cache. However, the code did not properly handle the race if the
write-back from L1 was received at the L2 cache after the L2 cache
started its data array access for its own write-back but before the L2
had a chance to send the subset-enforcement message to the L1. In such
a circumstance, the race would be detected in both the
MoveWRBToL1 function and the L2ProcessTagReq
function (line 1843, case COHE_REPLY, includes an fprintf call with
the message "8L2: race on WRB %ld in wrb_buf\n"
if DEBUG_SECONDARY is defined). However, the race was not
properly resolved in either place.
This bug did not arise during testing because none of the
application/configuration combinations being tested exposed this
particular race (although the programs did expose the version of the
race in which the write-back from the L1 was received after
the L2 sent its subset-enforcement message). This race can be induced
by increasing the length (in cycles) of the L2 data access.
Problem Fix
The simplest fix is to assume that the L2 cache will continue to send
a subset-enforcement message to the L1 cache, even though the L1 cache
has sent a write-back with the most current data; a higher performance
solution could be used if the L2 cache controller is assumed to be complex
enough so as to not send a subset-enforcement message if this race
is detected.
A new function called mark_undone_l1_in_wrb_buf is added.
This function is called in the MoveWRBToL1 function whenever this race
is detected.
These files have been fixed in the current distribution version of the
file l2cache.c; the following gives the code for the
new/updated functions. The old version of MoveWRBToL1 function
should no longer be used.
/*****************************************************************************/
/* mark_undone_l1_in_wrb_buf: Used to clear out L1 done flag in some races */
/* (see MoveWRBToL1 for more information.) */
/*****************************************************************************/
static void mark_undone_l1_in_wrb_buf(CACHE *captr, REQ *req)
{
int i;
for (i=0; iwrb_buf[i] && captr->wrb_buf[i]->tag == req->tag)
{
captr->wrb_buf[i]->done_l1 = 0;
return;
}
YS__errmsg("No match in wrb_buf!!!\n");
}
/*****************************************************************************/
/* MoveWRBToL1: Called at completion of L2 data access (if any). Marks L2 */
/* access part of WRB complete. Tries to send message up to L1, using entry */
/* in wrb-buf as a "smart MSHR" if it can't be immediately sent. */
/*****************************************************************************/
static void MoveWRBToL1(CACHE *captr, REQ *req, int withdata)
{
if (mark_done_data_in_wrb_buf(captr,req,withdata)) /* now both data and l1 paths are done */
{
#ifdef DEBUG_SECONDARY
if (YS__Simtime > DEBUG_TIME)
fprintf(simout,"%s\tWRB_COHE tag %ld-- race condition coming out of Data pipe\n",captr->name,req->tag);
#endif
/* this is a race condition:
L2 was in the process of victimizing this line, but in the
meanwhile, L1 has sent an unsolicited WRB to the same line.
The data from the unsolicited response from the L1 is the
correct data, so we must use that for the write-back to
send out.
A more complex L2 logic controller would go ahead and send
down the ACK from this point itself, but we'll assume a less
complicated controller for now. The L2 should go ahead and send
a message to the L1 and wait for it to get NACKed. So,
clear out the "L1 done" bit in order to allow this.
*/
mark_undone_l1_in_wrb_buf(captr,req);
}
/* always be sure to decouple the WRB inclusion request from the previous */
if (!AddReqToOutQ(captr,req))
AddToSmartMSHRList(captr,req,-1,NULL); /* consider the reserved WRB buffer to be a smart MSHR */
captr->stat.wb_inclusions_sent++;
}