[Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
Matthew Brost
matthew.brost at intel.com
Thu Jun 3 03:41:00 UTC 2021
On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
> On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
> > On 25/05/2021 17:45, Matthew Brost wrote:
> > > On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
> > >> * Context pinning code with it's magical two adds, subtract and cmpxchg is
> > >> dodgy as well.
> > >
> > > Daniele tried to remove this and it proved quite difficult + created
> > > even more races in the backend code. This was prior to the pre-pin and
> > > post-unpin code which makes this even more difficult to fix as I believe
> > > these functions would need to be removed first. Not saying we can't
> > > revisit this someday but I personally really like it - it is a clever
> > > way to avoid reentering the pin / unpin code while asynchronous things
> > > are happening rather than some complex locking scheme. Lastly, this code
> > > has proved incredibly stable as I don't think we've had to fix a single
> > > thing in this area since we've been using this code internally.
> >
> > Pretty much same as above. The code like:
> >
> > static inline void __intel_context_unpin(struct intel_context *ce)
> > {
> > if (!ce->ops->sched_disable) {
> > __intel_context_do_unpin(ce, 1);
> > } else {
> > while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
> > if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
> > ce->ops->sched_disable(ce);
> > break;
> > }
> > }
> > }
> > }
> >
> > That's pretty much impenetrable for me and the only thing I can think of
> > here is **ALARM** must be broken! See what others think..
Yea, probably should add a comment:
/*
* If the context has the sched_disable function, it isn't safe to unpin
* until this function completes. This function is allowed to complete
* asynchronously too. To avoid this function from being entered twice
* and move ownership of the unpin to this function's completion, adjust
* the pin count to 2 before it is entered. When this function completes
* the context can call intel_context_sched_unpin which decrements the
* pin count by 2 potentially resulting in an unpin.
*
* A while loop is needed to ensure the atomicity of the pin count. e.g.
* The below if / else statement has a race:
*
* if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
* ce->ops->sched_disable(ce);
* else
* atomic_dec(ce, 1);
*
* Two threads could simultaneously fail the if clause resulting in the
* pin_count going to 0 with scheduling enabled + the context pinned.
*/
>
> pin_count is a hand-rolled mutex, except not actually a real one, and
> it's absolutely hiliarous in it's various incarnations (there's one
> each on i915_vm, vma, obj and probably a few more).
>
> Not the worst one I've seen by far in the code we've merged already.
> Minimally this needs a comment here and in the struct next to
> @pin_count to explain where all this is abused, which would already
> make it better than most of the in-tree ones.
>
> As part of the ttm conversion we have a plan to sunset the "pin_count
> as a lock" stuff, depending how bad that goes we might need to split
> up the task for each struct that has such a pin_count.
>
Didn't know that with the TTM rework this value might go away. If that
is truely the direction I don't see the point in reworking this now. It
100% works and with a comment I think it can be understood what it is
doing.
Matt
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
More information about the Intel-gfx
mailing list