[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Wed Jul 23 00:02:11 PDT 2014

On Wed, Jul 23, 2014 at 8:52 AM, Christian König
<christian.koenig at amd.com> wrote:
>> In the preliminary patches where I can sync radeon with other GPU's I've
>> been very careful in all the places that call into fences, to make sure that
>> radeon wouldn't try to handle lockups for a different (possibly also radeon)
>> card.
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
> happen that radeon waits for the lockup to be resolved and the lockup
> handling needs to wait for a fence that's never signaled because of the
> lockup.

I thought the plan for now is that each driver handles lookups
themselfs for now. So if any batch gets stuck for too long (whether
it's our own gpu that's stuck or whether we're somehow stuck on a
fence from a 2nd gpu doesn't matter) the driver steps in with a reset
and signals completion to all its own fences that have been in that
pile-up. As long as each driver participating in fencing has means to
abort/reset we'll eventually get unstuck.

Essentially every driver has to guarantee that assuming dependent
fences all complete eventually that it _will_ complete its own fences
no matter what.

For now this should be good enough, but for arb_robusteness or people
who care a bit about their compute results we need reliable
notification to userspace that a reset happened. I think we could add
a new "aborted" fence state for that case and then propagate that. But
given how tricky the code to compute reset victims in i915 is already
I think we should leave this out for now. And even later on make it
strictly opt-in.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch