[Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915

Wed Jun 2 18:57:02 UTC 2021

On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
<tvrtko.ursulin at linux.intel.com> wrote:
> On 25/05/2021 17:45, Matthew Brost wrote:
> > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> >>   * Context pinning code with it's magical two adds, subtract and cmpxchg is
> >> dodgy as well.
> >
> > Daniele tried to remove this and it proved quite difficult + created
> > even more races in the backend code. This was prior to the pre-pin and
> > post-unpin code which makes this even more difficult to fix as I believe
> > these functions would need to be removed first. Not saying we can't
> > revisit this someday but I personally really like it - it is a clever
> > way to avoid reentering the pin / unpin code while asynchronous things
> > are happening rather than some complex locking scheme. Lastly, this code
> > has proved incredibly stable as I don't think we've had to fix a single
> > thing in this area since we've been using this code internally.
>
> Pretty much same as above. The code like:
>
> static inline void __intel_context_unpin(struct intel_context *ce)
> {
>         if (!ce->ops->sched_disable) {
>                 __intel_context_do_unpin(ce, 1);
>         } else {
>                 while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
>                         if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
>                                 ce->ops->sched_disable(ce);
>                                 break;
>                         }
>                 }
>         }
> }
>
> That's pretty much impenetrable for me and the only thing I can think of
> here is **ALARM** must be broken! See what others think..

pin_count is a hand-rolled mutex, except not actually a real one, and
it's absolutely hiliarous in it's various incarnations (there's one
each on i915_vm, vma, obj and probably a few more).

Not the worst one I've seen by far in the code we've merged already.
Minimally this needs a comment here and in the struct next to
@pin_count to explain where all this is abused, which would already
make it better than most of the in-tree ones.

As part of the ttm conversion we have a plan to sunset the "pin_count
as a lock" stuff, depending how bad that goes we might need to split
up the task for each struct that has such a pin_count.

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch