[Intel-gfx] uABI / Removing DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig

Fri Dec 21 07:45:29 UTC 2018

Quoting Michael Sartain (2018-12-20 20:27:19)
> On Wed, Dec 19, 2018, at 12:22 PM, Steven Rostedt wrote:
> > On Wed, 19 Dec 2018 12:08:18 +0200
> > Joonas Lahtinen <joonas.lahtinen at linux.intel.com> wrote:
> >·
> > > To me, it seems almost as if folks are too preoccupied with thinking if
> > > we somehow can do this through tracepoints, to stop and actually think
> > > if we should.
> >·
> > Regardless of whether it should or shouldn't, one solution to this is
> > to make the trace event in question record basically nothing but a
> > pointer.
> 
> Right now, these are the events I'm capturing w/ an AMD gpu: 
> 
>   amdgpu_cs:0-1150  [002] 630662.649417: amdgpu_cs_ioctl:      sched_job=3490671, timeline=gfx, context=105, seqno=3081096, ring_name=ffff91cb1ab1bdd0, num_ibs=3 
>           gfx-190   [000] 630662.649451: amdgpu_sched_run_job: sched_job=3490671, timeline=gfx, context=105, seqno=3081096, ring_name=ffff91cb1ab1bdd0, num_ibs=3 
>           gfx-190   [000] 630662.649454: dma_fence_signaled:   driver=amd_sched timeline=gfx context=104 seqno=3081096
> 
> With Intel gpu (and rebuilt kernel w/ CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS):
> 
>         <idle>-0     [002]   821.717208: intel_engine_notify:  dev=0, engine=0:0, seqno=38896, waiters=1
>   RenderThread-1024  [002]   825.722358: i915_request_queue:   dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, flags=0x0
>   RenderThread-1024  [002]   825.722371: i915_request_add:     dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=0
>   RenderThread-1024  [002]   825.722372: i915_request_submit:  dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=0
>   RenderThread-1024  [002]   825.745964: i915_request_execute: dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, global=42199
>   RenderThread-1024  [002]   825.745964: i915_request_in:      dev=0, engine=0:0, hw_id=9, ctx=30, seqno=38896, prio=0, global=42199, port=1
>         <idle>-0     [002]   825.755943: intel_engine_notify:  dev=0, engine=0:0, seqno=42199, waiters=1
> 
> It's quite obvious that just because gpuvis sees those amdgpu tracepoints when 
> running with the AMD card and parses and displays those, it does *not* get
> those same tracepoints when I run with an Intel gpu.
> 
> And with my Iris Pro Graphics 580 Gen9, I can reasonably expect to get the
> above i915 tracepoints.
> 
> But if I install a new Intel Xe Gen11, why should I expect to see those Gen9
> i915 tracepoint events? Is it because we are tying tracepoints and the
> created uABI to *kernel modules* and not the hardware?

You're on the correct track here. The issue is that even for Gen9, we
foresee to unnecessary maintenance burden to keep the tracepoints, as
we work on the scheduler or them disappearing from kernel scope when
HW-assisted scheduling is enabled.

And the lack of versioning on tracepoints does not make it easy on
userspace to do graceful degradiation or to detect the underneath
platform. Stuffing the versioning info to every tracepoint, to make sure
it's in the captured ring buffer being inspected, is not too elegant, so
some auxilary interface would still have to be probed.

> I'm asking, because personally I would expect the hardware to drive these
> tracepoint events, much like I check cpu flags to see whether I can run AVX
> code, or perf has intel_pt recording on one machine, but not another.

If we wanted to make sure we can keep them stable within a gen, we would
have to move them closer to the point we talk to hardware and would
basically just emit information that a) came from userspace (which is
stable due to uABI) or b) is going to hardware (we don't expect the
underlying hardware magically change).

> Right now gpuvis graphs the above events in an easy to understand view.
> Occasionally, it's really nice to use trace-cmd to get textual representation
> for grepping, etc. Storing pointers would obviously break that.

Steve's idea kind of solves that.

There would be an auxilary module build out-of-tree (say, from gpuvis),
that would emit a new tracepoint "b" with more information on triggering
tracepoint "a". So basically you would stop looking for tracepoint "a",
and load your module and just look for "b".

It's bit on the grey area when it comes to breaking userspace and the
philosophical question is, is it us breaking userspace or userspace setting
itself a trap. But I guess it might be OK, if the distros knowingly
bundle such out-of-tree module (which is not subject to kernel
stability).

> I guess if it's
> what we need to do to avoid the uABI problem, then it's what we do - still
> better than using an entirely new tracing system if we can avoid that.

The bigger problem that I'd still like to hear some ideas for is before
drawing conclusion is about elegantly sourcing the tracepoints from hardware
events. Trying to do live conversion from hardware generated ring buffer during
execution just to make sure it interleaves with the software generated ring
buffer and works under same trigger, sounds not so performant.

Usefulness of HW related "special" tracepoints without gpuvis doing the
time sorting based on parameters, not timestamp, could be too low to be
used with the general tooling like you mentioned. Then we have to think
about how much effort is it worth put into solving the HW to SW tracepoint
injection if we could with less total effort have a secondary interface
for hardware events.

Anyways, Happy Holidays all! I'll be back after New Year.

Regards, Joonas