[Intel-gfx] [PATCH] drm/i915: Stop doing writeback from the shrinker

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Dec 10 15:36:17 UTC 2021


On 10/12/2021 14:46, Thomas Hellström wrote:
> On Fri, 2021-12-10 at 11:05 +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> This effectively removes writeback which was added in 2d6692e642e7
>> ("drm/i915: Start writeback from the shrinker").
>>
>> Digging through the history it seems we went back and forth on the
>> topic
>> of whether it would be safe a couple of times. See for instance
>> 5537252b6b6d ("drm/i915: Invalidate our pages under memory pressure")
>> where Hugh Dickins has advised against it. I do not have enough
>> expertise
>> in the memory management area so am hoping for expert input here.
>>
>> Reason for proposing removal is that there are reports from the field
>> which indicate a sysetm wide deadlock (of a sort) implicating i915
>> doing
>> writeback at shrinking time.
>>
>> Signature is a hung task notifier kicking in and task traces such as:
> 
> It would be interesting to see what exactly the find_get_entry is
> blocked on. The other two tasks are blocked on the shrinker_rwsem which
> is held by i915. If it's indeed a deadlock with either of those two,

It may indeed be a livelock instead of a deadlock. I have received a 
newer trace and it indeed shows kswapd in running state. But no progress 
in 120s and dead machine sounded like too suspicious it could happen 
with just a gaming workload so I assumed a more serious issue than just 
severe memory pressure.

> then the fix Chris is working on for an unrelated issue we discovered
> with shrinking would move out the writeback call from the
> shrinker_rwsem and resolve this, but if i915 is in turn deadlocking
> with another process and these two are just hanging waiting for the
> shrinker_rwsem, we would still have other issues.

Presumably this would involve an extra worker and tracking on a list or 
something?

Otherwise my main hope really was to get a verdict from memory 
management experts on pros & cons of doing writeback from the driver in 
any flavour.

> Do you by any chance have the list of the locks held by the system at
> this point?

No, but maybe Renato you could also collect "echo d" and "echo m" to 
sysrq-trigger when things go bad?

Regards,

Tvrtko


More information about the Intel-gfx mailing list