[Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

Tue Jan 22 09:57:28 UTC 2019

Quoting Daniel Vetter (2019-01-22 09:11:53)
> On Tue, Jan 22, 2019 at 10:06 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >
> > Quoting Koenig, Christian (2019-01-22 08:49:30)
> > > Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > > > Rather than every backend and GPU driver reinventing the same wheel for
> > > > user level debugging of HW execution, the common dma-fence framework
> > > > should include the tracing infrastructure required for most client API
> > > > level flow visualisation.
> > > >
> > > > With these common dma-fence level tracepoints, the userspace tools can
> > > > establish a detailed view of the client <-> HW flow across different
> > > > kernels. There is a strong ask to have this available, so that the
> > > > userspace developer can effectively assess if they're doing a good job
> > > > about feeding the beast of a GPU hardware.
> > > >
> > > > In the case of needing to look into more fine-grained details of how
> > > > kernel internals work towards the goal of feeding the beast, the tools
> > > > may optionally amend the dma-fence tracing information with the driver
> > > > implementation specific. But for such cases, the tools should have a
> > > > graceful degradation in case the expected extra tracepoints have
> > > > changed or their format differs from the expected, as the kernel
> > > > implementation internals are not expected to stay the same.
> > > >
> > > > It is important to distinguish between tracing for the purpose of client
> > > > flow visualisation and tracing for the purpose of low-level kernel
> > > > debugging. The latter is highly implementation specific, tied to
> > > > a particular HW and driver, whereas the former addresses a common goal
> > > > of user level tracing and likely a common set of userspace tools.
> > > > Having made the distinction that these tracepoints will be consumed for
> > > > client API tooling, we raise the spectre of tracepoint ABI stability. It
> > > > is hoped that by defining a common set of dma-fence tracepoints, we avoid
> > > > the pitfall of exposing low level details and so restrict ourselves only
> > > > to the high level flow that is applicable to all drivers and hardware.
> > > > Thus the reserved guarantee that this set of tracepoints will be stable
> > > > (with the emphasis on depicting client <-> HW flow as opposed to
> > > > driver <-> HW).
> > > >
> > > > In terms of specific changes to the dma-fence tracing, we remove the
> > > > emission of the strings for every tracepoint (reserving them for
> > > > dma_fence_init for cases where they have unique dma_fence_ops, and
> > > > preferring to have descriptors for the whole fence context). strings do
> > > > not pack as well into the ftrace ringbuffer and we would prefer to
> > > > reduce the amount of indirect callbacks required for frequent tracepoint
> > > > emission.
> > > >
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > Cc: Alex Deucher <alexdeucher at gmail.com>
> > > > Cc: "Christian König" <christian.koenig at amd.com>
> > > > Cc: Eric Anholt <eric at anholt.net>
> > > > Cc: Pierre-Loup Griffais <pgriffais at valvesoftware.com>
> > > > Cc: Michael Sartain <mikesart at fastmail.com>
> > > > Cc: Steven Rostedt <rostedt at goodmis.org>
> > >
> > > In general yes please! If possible please separate out the changes to
> > > the common dma_fence infrastructure from the i915 changes.
> >
> > Sure, I was just stressing the impact: remove some randomly placed
> > internal debugging tracepoints, try to define useful ones instead :)
> >
> > On the list of things to do was to convert at least 2 other drivers
> > (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> > introduction to drm_sched than amdgpu) over to be sure we have the right
> > tracepoints.
> 
> I think sprinkling these over the scheduler (maybe just as an opt-in,
> for the case where the driver doesn't have some additional queueing
> somewhere) would be good. I haven't checked whether it fits, but would
> give you a bunch of drivers at once. It might also not cover all the
> cases (I guess the wait related ones would need to be somewhere else).

And the other thing (that got explicitly asked for!) was that we have
some igt to make sure we don't surreptitiously break the tracepoints
in future.

Another task would to devise the set of tracepoints to describe the
modesetting flow; that more or less is the flow of atomic helpers I
guess: prepare; wait-on-fences; commit; signal; cleanup. For system
snooping, knowing a target frame (msc or ust) and how late it was
delayed and the HW execution flow up to the frame and being able to tie
that back to the GL/VK client is the grand plan.
-Chris