[Intel-gfx] [PATCH 08/11] drm/i915: Keep timeline HWSP allocated until the system is idle

Wed Jan 30 18:25:22 UTC 2019

Quoting Tvrtko Ursulin (2019-01-30 17:54:51)
> 
> On 30/01/2019 02:19, Chris Wilson wrote:
> > In preparation for enabling HW semaphores, we need to keep in flight
> > timeline HWSP alive until the entire system is idle, as any other
> > timeline active on the GPU may still refer back to the already retired
> > timeline. We both have to delay recycling available cachelines and
> > unpinning old HWSP until the next idle point (i.e. on parking).
> > 
> > That we have to keep the HWSP alive for external references on HW raises
> > an interesting conundrum. On a busy system, we may never see a global
> > idle point, essentially meaning the resource will be leaking until we
> > are forced to sleep. What we need is a set of RCU primitives for the GPU!
> > This should also help mitigate the resource starvation issues
> > promulgating from keeping all logical state pinned until idle (instead
> > of as currently handled until the next context switch).
> > 
> > v2: Use idle barriers to free stale HWSP as soon as all current requests
> > are idle, rather than rely on the system reaching a global idle point.
> > (Tvrtko)
> > v3: Replace the idle barrier with read locks.
> 
> Time to change patch title and actually even rewrite the commit message 
> I think.

Why? We are keeping it until it is idle in the system, not just the
timeline. "Keep timeline HWSP allocated until idle in the system".

First paragraph is still true. Second paragraph can be more concise and
need not be a flight of fantasy.

> > diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> > index b2202d2e58a2..fd1a92a3663d 100644
> > --- a/drivers/gpu/drm/i915/i915_timeline.c
> > +++ b/drivers/gpu/drm/i915/i915_timeline.c
> > @@ -6,19 +6,28 @@
> >   
> >   #include "i915_drv.h"
> >   
> > -#include "i915_timeline.h"
> > +#include "i915_active.h"
> >   #include "i915_syncmap.h"
> > +#include "i915_timeline.h"
> >   
> >   struct i915_timeline_hwsp {
> > -     struct i915_vma *vma;
> > +     struct i915_gt_timelines *gt;
> 
> Coming back to a comment from one of the previous reviews, this is also 
> called gt but is a different thing altogether. I would really like that 
> we afford ourselves a few more characters so it just easier to read the 
> code.

Then start suggesting names. A tiny bit late to object to the gt pattern
at this point! :)

But do need a better name for the graphic device "globals"

> > +static struct i915_timeline_cacheline *
> > +cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
> > +{
> > +     struct i915_timeline_cacheline *cl;
> > +
> > +     GEM_BUG_ON(cacheline >= 64);
> 
> Maybe pull out CACHELINES_PER_PAGE as HWSP_CACHELINES_PER_PAGE or something?

No, this the size of the bitfield and I don't know of any preprocessor
tricks to determine that.

> > +     /*
> > +      * Attach the old cacheline to the current request, so that we only
> > +      * free it after the current request is retired, which ensures that
> > +      * all writes into the cacheline from previous requests are complete.
> > +      */
> > +     err = i915_active_ref(&tl->hwsp_cacheline->active,
> > +                           tl->fence_context, rq);
> 
> Right, this is the rq + 1 magic akin to unpin previous context. Was 
> confusing me for a bit why we would be assigning the old cacheline to 
> the current rq.

It just means we don't have to track it only every request in the local
timeline, which is a bit easier and lot less work.

> > +     if (err)
> > +             goto err_cacheline;
> > +
> > +     tl->hwsp_ggtt = i915_vma_get(vma);
> > +     tl->hwsp_offset = cacheline * CACHELINE_BYTES;
> > +     __i915_vma_pin(tl->hwsp_ggtt);
> > +
> > +     cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
> > +     cacheline_free(tl->hwsp_cacheline);
> > +
> > +     cacheline_acquire(cl);
> > +     tl->hwsp_cacheline = cl;
> > +
> > +     *seqno = timeline_advance(tl);
> > +     return 0;
> > +
> > +err_cacheline:
> 
> i915_vma_put looks to be missing here to fully unwind cacheline_alloc.

Now there's a cacheline_free() can be used here.

> > +     kfree(cl);
> > +err_hwsp:
> > +     __idle_hwsp_free(vma->private, cacheline);
> > +err_rollback:
> > +     timeline_rollback(tl);
> > +     return err;
> > +}