The following code was present in the function L2ProcessDataReq
in order to handle the case of a write-back from the L2 cache. This code
was intended to send a WRB down to memory and an INVL
to the L1 cache, in order to maintain subset enforcement.
case COHE: /* this is an upward-bound WRB or possibly INVL/WRB clubbing --
they are the only COHEs that come to the Data array */
{
/* call either MoveWRBToL1 or MoveINVLToL1 to send the
request up to the L1 cache, possibly taking advantage of
the wrb-buf as a smart MSHR. Thus, this access will not
block the pipe, and cannot cause a deadlock here. */
if (L1TYPE == FIRSTLEVEL_WB && req->req_type == WRB)
MoveWRBToL1(captr,req,1);
else if (L1TYPE==FIRSTLEVEL_WT && req->req_type == WRB)
MoveINVLToL1(captr,req->invl_req,req);
else
YS__errmsg("Unknown COHE type in L2 data pipe.\n");
return 1; /* always be able to sink this! */
}
break;
case COHE_REPLY:
{
/* This can be an unsolicited L1 writeback to L2 (which is
absorbed), a victimization writeback from L2 that is not
also sent to the L1 (that is, if the L1 is WT; these are
sent down to memory), or an external COHE with copyback
(also sent down). */
if (req->absorb_at_l2 == ABSORB) /* an L1 writeback to L2, for example */
{
#ifdef DEBUG_SECONDARY
if (YS__Simtime > DEBUG_TIME)
fprintf(simout,"%s: Digesting a WRB message %ld from L1 in Data pipe\n",captr->name,req->tag);
#endif
YS__PoolReturnObj(&YS__ReqPool, req); /* we need to digest it */
return 1;
}
if (req->req_type == WRB)
{
/* Has a space in the wrb_buf and we must sink it to
avoid deadlock. So, if this request cannot be sent
immediately, put in the smart MSHR list, such that
it frees the wrb_buf entry when it gets sent out. */
if (AddReqToOutQ(captr,req))
remove_from_wrb_buf(captr,posn_in_wrb_buf(captr,req),NULL);
else
AddToSmartMSHRList(captr,req,posn_in_wrb_buf(captr,req),remove_from_wrb_buf);
return 1;
}
else
return AddReqToOutQ(captr,req); /* this returns 0 on failure, 1 on success, same as us */
}
However, combined write-back/invalidate pairs pass through the L2 data
pipe as COHE_REPLY, not as COHE. Thus, the code to move
invalidates to the level 1 cache in the COHE case is never
executed, and the write-backs propagate downward without ever invalidating
the L1 cache. Consequently, subset-enforcement (or inclusion) can be lost, and
the L1 cache may contain cache lines that are not present in the L2.
Problem Fix
The COHE and COHE\_REPLY cases of the function
L2ProcessDataReq are replaced with the following code:
case COHE: /* this is an upward-bound WRB --
they are the only COHEs that come to the Data array */
{
/* call MoveWRBToL1 to send the request up to the L1 cache,
possibly taking advantage of the wrb-buf as a smart
MSHR. Thus, this access will not block the pipe, and cannot
cause a deadlock here. */
if (L1TYPE == FIRSTLEVEL_WB && req->req_type == WRB)
MoveWRBToL1(captr,req,1);
else
YS__errmsg("Unknown COHE type in L2 data pipe.\n");
return 1; /* always be able to sink this! */
}
break;
case COHE_REPLY:
{
/* This can be an unsolicited L1 writeback to L2 (which is
absorbed), a WRB-INVL clubbing (if the L1 cache is WT,
the INVL for subset enforcement to the L1 is sent from here,
as well as the WRB to memory), or an external COHE with
copyback (sent down to memory). */
if (req->absorb_at_l2 == ABSORB) /* an L1 writeback to L2, for example */
{
#ifdef DEBUG_SECONDARY
if (YS__Simtime > DEBUG_TIME)
fprintf(simout,"%s: Digesting a WRB message %ld from L1 in Data pipe\n",captr->name,req->tag);
#endif
YS__PoolReturnObj(&YS__ReqPool, req); /* we need to digest it */
return 1;
}
/* This access may be a INVL-WRB clubbing. In that case,
use MoveINVLToL1 to send the INVL request up to the L1 cache and
the WRB request down to memory. This is parallel to the
use of MoveWRBToL1 in the COHE case above. */
if (L1TYPE==FIRSTLEVEL_WT && req->req_type == WRB)
{
MoveINVLToL1(captr,req->invl_req,req);
return 1;
}
else if (req->req_type == WRB)
{
YS__errmsg("Invalid COHE_REPLY type for L2ProcessDataReq.");
return 1;
}
else /* ordinary coherence action */
return AddReqToOutQ(captr,req); /* this returns 0 on failure, 1 on success, same as us */
}
Although this problem affects reported simulated performance, it should
not affect performance results significantly, as L2 caches are generally
significantly larger than L1 caches. As a result, data will tend to be
replaced from the L1 cache before being replaced from the L2 cache.
Further, inclusion was correctly maintained previously for external
coherence actions and victimizations of shared or exclusive data.