[PATCH] dma-buf: Enhance dma-fence tracing

Tue Jan 22 10:09:18 UTC 2019

On Tue, Jan 22, 2019 at 10:58 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> Quoting Daniel Vetter (2019-01-22 09:11:53)
> > On Tue, Jan 22, 2019 at 10:06 AM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > >
> > > Quoting Koenig, Christian (2019-01-22 08:49:30)
> > > > Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > > > > Rather than every backend and GPU driver reinventing the same wheel for
> > > > > user level debugging of HW execution, the common dma-fence framework
> > > > > should include the tracing infrastructure required for most client API
> > > > > level flow visualisation.
> > > > >
> > > > > With these common dma-fence level tracepoints, the userspace tools can
> > > > > establish a detailed view of the client <-> HW flow across different
> > > > > kernels. There is a strong ask to have this available, so that the
> > > > > userspace developer can effectively assess if they're doing a good job
> > > > > about feeding the beast of a GPU hardware.
> > > > >
> > > > > In the case of needing to look into more fine-grained details of how
> > > > > kernel internals work towards the goal of feeding the beast, the tools
> > > > > may optionally amend the dma-fence tracing information with the driver
> > > > > implementation specific. But for such cases, the tools should have a
> > > > > graceful degradation in case the expected extra tracepoints have
> > > > > changed or their format differs from the expected, as the kernel
> > > > > implementation internals are not expected to stay the same.
> > > > >
> > > > > It is important to distinguish between tracing for the purpose of client
> > > > > flow visualisation and tracing for the purpose of low-level kernel
> > > > > debugging. The latter is highly implementation specific, tied to
> > > > > a particular HW and driver, whereas the former addresses a common goal
> > > > > of user level tracing and likely a common set of userspace tools.
> > > > > Having made the distinction that these tracepoints will be consumed for
> > > > > client API tooling, we raise the spectre of tracepoint ABI stability. It
> > > > > is hoped that by defining a common set of dma-fence tracepoints, we avoid
> > > > > the pitfall of exposing low level details and so restrict ourselves only
> > > > > to the high level flow that is applicable to all drivers and hardware.
> > > > > Thus the reserved guarantee that this set of tracepoints will be stable
> > > > > (with the emphasis on depicting client <-> HW flow as opposed to
> > > > > driver <-> HW).
> > > > >
> > > > > In terms of specific changes to the dma-fence tracing, we remove the
> > > > > emission of the strings for every tracepoint (reserving them for
> > > > > dma_fence_init for cases where they have unique dma_fence_ops, and
> > > > > preferring to have descriptors for the whole fence context). strings do
> > > > > not pack as well into the ftrace ringbuffer and we would prefer to
> > > > > reduce the amount of indirect callbacks required for frequent tracepoint
> > > > > emission.
> > > > >
> > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > > Cc: Alex Deucher <alexdeucher at gmail.com>
> > > > > Cc: "Christian König" <christian.koenig at amd.com>
> > > > > Cc: Eric Anholt <eric at anholt.net>
> > > > > Cc: Pierre-Loup Griffais <pgriffais at valvesoftware.com>
> > > > > Cc: Michael Sartain <mikesart at fastmail.com>
> > > > > Cc: Steven Rostedt <rostedt at goodmis.org>
> > > >
> > > > In general yes please! If possible please separate out the changes to
> > > > the common dma_fence infrastructure from the i915 changes.
> > >
> > > Sure, I was just stressing the impact: remove some randomly placed
> > > internal debugging tracepoints, try to define useful ones instead :)
> > >
> > > On the list of things to do was to convert at least 2 other drivers
> > > (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> > > introduction to drm_sched than amdgpu) over to be sure we have the right
> > > tracepoints.
> >
> > I think sprinkling these over the scheduler (maybe just as an opt-in,
> > for the case where the driver doesn't have some additional queueing
> > somewhere) would be good. I haven't checked whether it fits, but would
> > give you a bunch of drivers at once. It might also not cover all the
> > cases (I guess the wait related ones would need to be somewhere else).
>
> And the other thing (that got explicitly asked for!) was that we have
> some igt to make sure we don't surreptitiously break the tracepoints
> in future.
>
> Another task would to devise the set of tracepoints to describe the
> modesetting flow; that more or less is the flow of atomic helpers I
> guess: prepare; wait-on-fences; commit; signal; cleanup. For system
> snooping, knowing a target frame (msc or ust) and how late it was
> delayed and the HW execution flow up to the frame and being able to tie
> that back to the GL/VK client is the grand plan.

Yeah with atomic helpers this should be doable, as long as the driver
uses the commit tracking part of the helpers. That's the stuff done by
drm_atomic_helper_setup_commit(). I think by now that means all atomic
drivers. I think the following would be good enough to track atomic
updates:
- atomic_check
- atomic_commit_begin/end (for the synchronous portion, for
nonblocking commits that would be mostly lock contention/cache
flushing/pte wrangling in prepare_fb for fb pinning and stuff like
that).
- a flip complete tracepoint stuff into drm_crtc_send_vblank_event,
we'll probably the lower level drm_send_event_locked() with some
additional metadata stuffed into drm_pending_event for the
tracepoint's consumption.

For a bonus we probably want to annotate vblank waits/events too, so
that all tracepoint hits in drm_send_event_locked have a source
somewhere (either atomic commit or vblank wait/event). The waits the
commit thread does should already be covered with your proposal here,
but maybe we need some additional metadata to tie it in with the other
atomic stuff in the UI. Otoh if the generic UI shows the name of the
worker (there's another set of tracepoints for that iirc) it should be
obvious what's going on.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch