[Intel-gfx] [PATCH 2/3] drm/i915: Drop inspection of execbuf flags during evict

Daniel Vetter daniel at ffwll.ch
Fri Nov 8 16:06:20 UTC 2019


On Fri, Nov 8, 2019 at 11:40 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> Quoting Daniel Vetter (2019-11-08 10:20:23)
> > On Fri, Nov 8, 2019 at 11:11 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > Quoting Daniel Vetter (2019-11-08 09:54:42)
> > > > On Wed, Nov 6, 2019 at 4:49 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > > >
> > > > > With the goal of removing the serialisation from around execbuf, we will
> > > > > no longer have the privilege of there being a single execbuf in flight
> > > > > at any time and so will only be able to inspect the user's flags within
> > > > > the carefully controlled execbuf context. i915_gem_evict_for_node() is
> > > > > the only user outside of execbuf that currently peeks at the flag to
> > > > > convert an overlapping softpinned request from ENOSPC to EINVAL. Retract
> > > > > this nicety and only report ENOSPC if the location is in current use,
> > > > > either due to this execbuf or another.
> > > > >
> > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > >
> > > > Same reasons as for patch 3, I don't think we have to do this at all.
> > >
> > > This is already undefined behaviour. That field is protected by
> > > struct_mutex and being evaluated outside of that lock.
> >
> > If this can be called on objects involved in execbuf, without
> > struct_mutex, then we already have a correctness problem of vma space
> > (which is super tight on old platforms and rather much required to be
> > well-managed because of that) being lost because concurrent threads
> > thrash it instead of forming an orderly queue. And if that's not the
> > case, and they do form an orderly queue, then there's no problem since
> > even the as-needed-only orderly queue provided by ww_mutex will then
> > be enough locking to keep this working.
>
> It doesn't get called on those objects, those objects may just be
> neighbouring and being inspected for potential eviction candidates. The
> lists themselves are protected by their mutex, it's just the contention
> over the pin_count.

Hm yeah in a per-bo locked future world this won't work. But today it
should be either vm->mutex or dev->struct_mutex, not already broken?

Otoh in the per-bo locked future we only care about conflicts with our
own execbuf, which means we could check whether the object belongs to
our batch (very easy by looking at dma_resv->lock.ctx, ttm does that
in a few places), and only do the check in that case. So could retain
full uapi semantics here without additional effort (we need to have
these locks anway, at least in any kind of execbuf slowpath where the
bo aren't all mapped when we start out). So still not understanding
(even with the "it's other bo" overlook rectified) why we have to drop
this?

> > Aside: Yeah I think we need to re-add struct_mutex to the gtt fault
> > path, the temporary pinning in there could easily starve execbuf on
> > platforms where batches run in ggtt. Maybe also some other areas where
> > we lost struct_mutex around temporary vma->pin_count elevations.
>
> That's where we are going next; not with struct_mutex but fenced access
> to reservations to replace the temporary (not HW access) pinning.

fenced as in dma_fence or dma_resv_lock?

Also if we indeed have an issue with lost elevated pin_counts now I
think we shouldn't ship 5.5 with that, and reapply the duct tape until
it's fixed for good.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list