[Intel-gfx] [PATCH] drm/i915: Stop doing writeback from the shrinker
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Dec 10 15:36:17 UTC 2021
On 10/12/2021 14:46, Thomas Hellström wrote:
> On Fri, 2021-12-10 at 11:05 +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> This effectively removes writeback which was added in 2d6692e642e7
>> ("drm/i915: Start writeback from the shrinker").
>>
>> Digging through the history it seems we went back and forth on the
>> topic
>> of whether it would be safe a couple of times. See for instance
>> 5537252b6b6d ("drm/i915: Invalidate our pages under memory pressure")
>> where Hugh Dickins has advised against it. I do not have enough
>> expertise
>> in the memory management area so am hoping for expert input here.
>>
>> Reason for proposing removal is that there are reports from the field
>> which indicate a sysetm wide deadlock (of a sort) implicating i915
>> doing
>> writeback at shrinking time.
>>
>> Signature is a hung task notifier kicking in and task traces such as:
>
> It would be interesting to see what exactly the find_get_entry is
> blocked on. The other two tasks are blocked on the shrinker_rwsem which
> is held by i915. If it's indeed a deadlock with either of those two,
It may indeed be a livelock instead of a deadlock. I have received a
newer trace and it indeed shows kswapd in running state. But no progress
in 120s and dead machine sounded like too suspicious it could happen
with just a gaming workload so I assumed a more serious issue than just
severe memory pressure.
> then the fix Chris is working on for an unrelated issue we discovered
> with shrinking would move out the writeback call from the
> shrinker_rwsem and resolve this, but if i915 is in turn deadlocking
> with another process and these two are just hanging waiting for the
> shrinker_rwsem, we would still have other issues.
Presumably this would involve an extra worker and tracking on a list or
something?
Otherwise my main hope really was to get a verdict from memory
management experts on pros & cons of doing writeback from the driver in
any flavour.
> Do you by any chance have the list of the locks held by the system at
> this point?
No, but maybe Renato you could also collect "echo d" and "echo m" to
sysrq-trigger when things go bad?
Regards,
Tvrtko
More information about the Intel-gfx
mailing list