RSIM Bug Report #1

Version of RSIM1.0
Bug number1
Bug class2
Date9/15/97
Reported byauthors
AffectsConfigurations with L1 write-back caches
Filesl2cache.c

Problem Description

This bug can arise in configurations with both L1 and L2 write-back caches. Specifically, if both caches try to replace the same line at the same time, deadlock can occur. The cache already included code to handle this race if the write-back from L1 was received at the L2 cache after the L2 cache sent its subset-enforcement message to the L1 cache. However, the code did not properly handle the race if the write-back from L1 was received at the L2 cache after the L2 cache started its data array access for its own write-back but before the L2 had a chance to send the subset-enforcement message to the L1. In such a circumstance, the race would be detected in both the MoveWRBToL1 function and the L2ProcessTagReq function (line 1843, case COHE_REPLY, includes an fprintf call with the message "8L2: race on WRB %ld in wrb_buf\n" if DEBUG_SECONDARY is defined). However, the race was not properly resolved in either place.

This bug did not arise during testing because none of the application/configuration combinations being tested exposed this particular race (although the programs did expose the version of the race in which the write-back from the L1 was received after the L2 sent its subset-enforcement message). This race can be induced by increasing the length (in cycles) of the L2 data access.

Problem Fix

The simplest fix is to assume that the L2 cache will continue to send a subset-enforcement message to the L1 cache, even though the L1 cache has sent a write-back with the most current data; a higher performance solution could be used if the L2 cache controller is assumed to be complex enough so as to not send a subset-enforcement message if this race is detected.

A new function called mark_undone_l1_in_wrb_buf is added. This function is called in the MoveWRBToL1 function whenever this race is detected.

These files have been fixed in the current distribution version of the file l2cache.c; the following gives the code for the new/updated functions. The old version of MoveWRBToL1 function should no longer be used.

/*****************************************************************************/ /* mark_undone_l1_in_wrb_buf: Used to clear out L1 done flag in some races */ /* (see MoveWRBToL1 for more information.) */ /*****************************************************************************/ static void mark_undone_l1_in_wrb_buf(CACHE *captr, REQ *req) { int i; for (i=0; i<wrb_buf_size; i++) if (captr->wrb_buf[i] && captr->wrb_buf[i]->tag == req->tag) { captr->wrb_buf[i]->done_l1 = 0; return; } YS__errmsg("No match in wrb_buf!!!\n"); } /*****************************************************************************/ /* MoveWRBToL1: Called at completion of L2 data access (if any). Marks L2 */ /* access part of WRB complete. Tries to send message up to L1, using entry */ /* in wrb-buf as a "smart MSHR" if it can't be immediately sent. */ /*****************************************************************************/ static void MoveWRBToL1(CACHE *captr, REQ *req, int withdata) { if (mark_done_data_in_wrb_buf(captr,req,withdata)) /* now both data and l1 paths are done */ { #ifdef DEBUG_SECONDARY if (YS__Simtime > DEBUG_TIME) fprintf(simout,"%s\tWRB_COHE tag %ld-- race condition coming out of Data pipe\n",captr->name,req->tag); #endif /* this is a race condition: L2 was in the process of victimizing this line, but in the meanwhile, L1 has sent an unsolicited WRB to the same line. The data from the unsolicited response from the L1 is the correct data, so we must use that for the write-back to send out. A more complex L2 logic controller would go ahead and send down the ACK from this point itself, but we'll assume a less complicated controller for now. The L2 should go ahead and send a message to the L1 and wait for it to get NACKed. So, clear out the "L1 done" bit in order to allow this. */ mark_undone_l1_in_wrb_buf(captr,req); } /* always be sure to decouple the WRB inclusion request from the previous */ if (!AddReqToOutQ(captr,req)) AddToSmartMSHRList(captr,req,-1,NULL); /* consider the reserved WRB buffer to be a smart MSHR */ captr->stat.wb_inclusions_sent++; }