[Intel-gfx] [PATCH v3 21/22] drm/atomic: Introduce drm_atomic_helper_duplicate_commited_state()

Mon Jul 10 14:47:19 UTC 2017

On Mon, Jul 10, 2017 at 3:26 PM, Maarten Lankhorst
<maarten.lankhorst at linux.intel.com> wrote:
>>> The real fix is not taking struct_mutex during atomic commit, which will mean
>>> no deadlock can happen.
>>>
>>> Is this the bug being fixed here or am I missing something?
>> This would avoid both struct_mutex and modeset locks in the display
>> reset path, so I guess it should help with struct_mutex issues
>> as well.
>>
> I think fixing i915 to not require struct_mutex for vma pinning/unpinning will be a better use of our time, and it should also fix all deadlocks. :)
>
> And it's far better than duplicating drm_atomic_commit functionality in our reset handlers.

Part of the reasons I've asked is because I thought originally this
regression was introduced in

4680816be336 ("drm/i915: Wait first for submission, before waiting for
request completion")
221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")

futuremore complicated by all the TDR work to no longer
force-completing requests, but instead resubmitting them. The deadlock
is a lot more than struct_mutex, since we can wait for requests
without holding that one (through the recent-ish conversion to
i915_sw_fence of the atomic commit path).

I'm still asking why we can't fix this regression again on the GEM
side where it seems to have been introduced. We might or might still
want to rewrite atomic to make it work better, and there's additional
races with the nonblocking atomic commits (an oversight on my part, I
also flat-out forget about gen4 gpu reset), but I still think the
usual way should be
1. minimal regression fix
2. more extensive rework (if needed) of the lessons learned

So am I wrong with blaming this on GEM, or why can't the GEM folks fix
this? I think removing the "this is a regression and blocking adding
more machines to CI" pressure would make the discussion between Ville
and me a lot more constructive too.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch