[Intel-gfx] [RFC 5/7] drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU

Gupta, Sourab sourab.gupta at intel.com
Thu Jun 25 01:27:15 PDT 2015


On Thu, 2015-06-25 at 07:42 +0000, Daniel Vetter wrote:
> On Thu, Jun 25, 2015 at 06:02:35AM +0000, Gupta, Sourab wrote:
> > On Mon, 2015-06-22 at 16:09 +0000, Daniel Vetter wrote:
> > > On Mon, Jun 22, 2015 at 02:22:54PM +0100, Chris Wilson wrote:
> > > > On Mon, Jun 22, 2015 at 03:25:07PM +0530, sourab.gupta at intel.com wrote:
> > > > > From: Sourab Gupta <sourab.gupta at intel.com>
> > > > > 
> > > > > To collect timestamps around any GPU workload, we need to insert
> > > > > commands to capture them into the ringbuffer. Therefore, during the stop event
> > > > > call, we need to wait for GPU to complete processing the last request for
> > > > > which these commands were inserted.
> > > > > We need to ensure this processing is done before event_destroy callback which
> > > > > deallocates the buffer for holding the data.
> > > > 
> > > > There's no reason for this to be synchronous. Just that you need an
> > > > active reference on the output buffer to be only released after the
> > > > final request.
> > > 
> > > Yeah I think the interaction between OA sampling and GEM is the critical
> > > piece here for both patch series. Step one is to have a per-pmu lock to
> > > keep track of data private to OA and mmio based sampling as a starting
> > > point. Then we need to figure out how to structure the connection without
> > > OA/PMU and gem trampling over each another's feet.
> > > 
> > > Wrt requests those are refcounted already and just waiting on them doesn't
> > > need dev->struct_mutex. That should be all you need, assuming you do
> > > correctly refcount them ...
> > > -Daniel
> > 
> > Hi Daniel/Chris,
> > 
> > Thanks for your feedback. I acknowledge the issue with increasing
> > coverage scope of i915 dev mutex. I'll have a per-pmu lock to keep track
> > of data private to OA in my next version of patches.
> 
> If you create new locking schemes please coordinate with Rob about them.
> Better if we come up with a consistent locking scheme for all of OA. That
> ofc doesn't exclude that we'll have per-pmu locking, just that the locking
> design should match if it makes sense. The example I'm thinking of is
> drrs&psr which both use the frontbuffer tracking code and both have their
> private mutex. But the overall approach to locking and cleaning up the
> async work is the same.
> 
Ok, I'll coordinate with Robert about a consistent locking scheme here.

> > W.r.t. having the synchronous wait during event stop, I thought through
> > the method of having active reference on output buffer. This will let us
> > have a delayed buffer destroy (controlled by decrement of output buffer
> > refcount as and when gpu requests complete). This will be decoupled from
> > the event stop here. But that is not the only thing at play here.
> > 
> > The event stop is expected to disable the OA unit (which it is doing
> > right now). In my usecase, I can't disable OA unit, till the time GPU
> > processes the MI_RPC requests scheduled already, which otherwise may
> > lead to hangs.
> > So, I'm waiting synchronously for requests to complete before disabling
> > OA unit.
> > 
> > Also, the event_destroy callback should be destroying the buffers
> > associated with event. (Right now, in current design, event_destroy
> > callback waits for the above operations to be completed before
> > proceeding to destroy the buffer).
> > 
> > If I try to create a function which destroys the output buffer according
> > to all references being released, all these operations will have to be
> > performed together in that function, in this serial order (so that when
> > refcount drops to 0, i.e. requests have completed, OA unit will be
> > disabled, and then the buffer destroyed). This would lead to these
> > operations being decoupled from the event_stop and event_destroy
> > functions. This may be non-intentional behaviour w.r.t. these callbacks
> > from userspace perspective.
> > 
> > Robert, any thoughts?
> 
> Yeah now idea whether those perf callbacks are allowed to stall for a
> longer time. If not we could simply push the cleanup/OA disabling into an
> async work.
> -Daniel

Hi Daniel,
Actually right now, the stop_event callback is not stalled. Instead I'm
scheduling an async work fn to do the job. The event destroy is then
waiting for the work fn to finish processing, before proceeding on to
destroy the buffer.
I'm also thinking through Chris' suggestions here and will try to come
up with solution based on them.

Thanks,
Sourab



More information about the Intel-gfx mailing list