[Intel-gfx] [RFC 5/7] drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU

Wed Jun 24 23:02:35 PDT 2015

On Mon, 2015-06-22 at 16:09 +0000, Daniel Vetter wrote:
> On Mon, Jun 22, 2015 at 02:22:54PM +0100, Chris Wilson wrote:
> > On Mon, Jun 22, 2015 at 03:25:07PM +0530, sourab.gupta at intel.com wrote:
> > > From: Sourab Gupta <sourab.gupta at intel.com>
> > > 
> > > To collect timestamps around any GPU workload, we need to insert
> > > commands to capture them into the ringbuffer. Therefore, during the stop event
> > > call, we need to wait for GPU to complete processing the last request for
> > > which these commands were inserted.
> > > We need to ensure this processing is done before event_destroy callback which
> > > deallocates the buffer for holding the data.
> > 
> > There's no reason for this to be synchronous. Just that you need an
> > active reference on the output buffer to be only released after the
> > final request.
> 
> Yeah I think the interaction between OA sampling and GEM is the critical
> piece here for both patch series. Step one is to have a per-pmu lock to
> keep track of data private to OA and mmio based sampling as a starting
> point. Then we need to figure out how to structure the connection without
> OA/PMU and gem trampling over each another's feet.
> 
> Wrt requests those are refcounted already and just waiting on them doesn't
> need dev->struct_mutex. That should be all you need, assuming you do
> correctly refcount them ...
> -Daniel

Hi Daniel/Chris,

Thanks for your feedback. I acknowledge the issue with increasing
coverage scope of i915 dev mutex. I'll have a per-pmu lock to keep track
of data private to OA in my next version of patches.

W.r.t. having the synchronous wait during event stop, I thought through
the method of having active reference on output buffer. This will let us
have a delayed buffer destroy (controlled by decrement of output buffer
refcount as and when gpu requests complete). This will be decoupled from
the event stop here. But that is not the only thing at play here.

The event stop is expected to disable the OA unit (which it is doing
right now). In my usecase, I can't disable OA unit, till the time GPU
processes the MI_RPC requests scheduled already, which otherwise may
lead to hangs.
So, I'm waiting synchronously for requests to complete before disabling
OA unit.

Also, the event_destroy callback should be destroying the buffers
associated with event. (Right now, in current design, event_destroy
callback waits for the above operations to be completed before
proceeding to destroy the buffer).

If I try to create a function which destroys the output buffer according
to all references being released, all these operations will have to be
performed together in that function, in this serial order (so that when
refcount drops to 0, i.e. requests have completed, OA unit will be
disabled, and then the buffer destroyed). This would lead to these
operations being decoupled from the event_stop and event_destroy
functions. This may be non-intentional behaviour w.r.t. these callbacks
from userspace perspective.

Robert, any thoughts?