[Intel-gfx] [PATCH] drm/i915/breadcrumbs: Drop request reference for the signaler thread

Chris Wilson chris at chris-wilson.co.uk
Fri Jan 26 09:42:02 UTC 2018


Quoting Chris Wilson (2018-01-24 14:44:01)
> Quoting Tvrtko Ursulin (2018-01-24 13:09:37)
> > 
> > On 22/01/2018 15:41, Chris Wilson wrote:
> > > If we remember to cancel the signaler on a request when retiring it
> > > (after we know that the request has been signaled), we do not need to
> > > carry an additional request in the signaler itself. This prevents an
> > > issue whereby the signaler threads may be delayed and hold on to
> > > thousands of request references, causing severe memory fragmentation and
> > > premature oom (most noticeable on 32b snb due to the limited GFP_KERNEL
> > > and frequent use of inter-engine fences).
> > 
> > What is starving the signaler thread, which is set to SCHED_FIFO, and 
> > can't be tasklets on SNB?
> 
> Interrupts. MI_USER_INTERRUPT to be precise, but we have to check all
> the other sources on snb as well.
> 
> > Before I actually start revieweing the code, which I'd rather avoid :) :
> > 
> > Is it just not able to process enough requests in it's time-slice 
> > (need_resched) so is falling behind? It would be surprising since I 
> > would expect it to be much lighter wait processing there, per request, 
> > than on the submission paths.
> 
> The conclusion is a bit odd, but more or less it's just a pathological
> case where interrupts + rt task are contending for one cpu with
> submission proceeding on another. Making the signaler lighter was the
> intention of the rest of the series, but this patch by itself prevents
> the runaway references.

Whilst I'm thinking of this, when I hit oom on snb, there were ~3
million requests allocated (/proc/slabinfo) but only ~3 in-flight.
Tracing the request references gave the clue that the only outstanding
ones were in the signaler (there were only 2 sources of references, one
for the active request and one for the signaler; and we accounted for the
active request knowing that they were being retired).
-Chris


More information about the Intel-gfx mailing list