[Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Aug 8 12:13:08 UTC 2018


On 26/06/2018 12:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-26 12:24:51)
>>
>> On 26/06/2018 11:55, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-06-26 11:46:51)
>>>>
>>>> On 25/06/2018 21:02, Chris Wilson wrote:
>>>>> If we know what is wanted can we define that better in terms of
>>>>> dma_fence and leave lowlevel for debugging (or think of how we achieve
>>>>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
>>>>> that.
>>>>
>>>> What is wanted is for instance take trace.pl on any kernel anywhere and
>>>> it is able to deduce/draw the exact metrics/timeline of command
>>>> submission for an workload.
>>>>
>>>> At the moment it without low level tracepoints, and without the
>>>> intel_engine_notify tweak, it is workload dependent on how close it
>>>> could get.
>>>
>>> Interjecting what dma-fence already has (or we could use), not sure how
>>> well userspace can actually map it to their timelines.
>>>>
>>>> So a set of tracepoints to allow drawing the timeline:
>>>>
>>>> 1. request_queue (or _add)
>>> dma_fence_init
>>>
>>>> 2. request_submit
>>>
>>>> 3. intel_engine_notify
>>> For obvious reasons, no match in dma_fence.
>>>
>>>> 4. request_in
>>> dma_fence_emit
>>>
>>>> 5. request out
>>> dma_fence_signal (similar, not quite, we would have to force irq
>>> signaling).
>>
>> Yes not quite the same due potential time shift between user interrupt
>> and dma_fence_signal call via different paths.
>>
>>>    
>>>> With this set the above is possible and we don't need a lot of work to
>>>> get there.
>>>
>>>   From a brief glance we are missing a dma_fence_queue for request_submit
>>> replacement.
>>>
>>> So next question is what information do we get from our tracepoints (or
>>> more precisely do you use) that we lack in dma_fence?
>>
>> Port=%u and preemption (completed=%u) comes immediately to mind. Way to
>> tie with engines would be nice or it is all abstract timelines.
>>
>> Going this direction sounds like a long detour to get where we almost
>> are. I suspect you are valuing the benefit of it being generic and hence
>> and parsing tool could be cross-driver. But you can also just punt the
>> "abstractising" into the parsing tool.
> 
> It's just that this about the third time this has been raised in the
> last couple of weeks with the other two requests being from a generic
> tooling pov (Eric Anholt for gnome-shell tweaking, and some one
> else looking for a gpuvis-like tool). So it seems like there is
> interest, even if I doubt that it'll help answer any questions beyond
> what you can just extract from looking at userspace. (Imo, the only
> people these tracepoints are useful for are people writing patches for
> the driver. For everyone else, you can just observe system behaviour and
> optimise your code for your workload. Otoh, can one trust a black
> box, argh.)

Some of the things might be obtainable purely from userspace via heavily 
instrumented builds, which may be in the realm of possible for during 
development, but I don't think it is feasible in general both because it 
is too involved, and because it would preclude existence of tools which 
can trace any random client.

> To have a second set of nearly equivalent tracepoints, we need to have
> strong justification why we couldn't just use or extend the generic set.

I was hoping that the conversation so far established that nearly 
equivalent is not close enough for intended use cases. And that is not 
possible to make the generic ones so.

> Plus I feel a lot more comfortable exporting a set of generic
> tracepoints, than those where we may be leaking more knowledge of the HW
> than we can reasonably expect to support for the indefinite future.

I think it is accepted we cannot guarantee low level tracepoints will be 
supportable in the future world of GuC scheduling. (How and what we will 
do there is yet unresolved.) But at least we get much better usability 
for platforms up to there, and for very small effort. The idea is not to 
mark these as ABI but just improve user experience.

You are I suppose worried that if these tracepoints disappeared due 
being un-implementable someone will complain?

I just want that anyone can run trace.pl and see how virtual engine 
behaves, without having to recompile the kernel. And VTune people want 
the same for their enterprise-level customers. Both tools are ready to 
adapt should it be required. Its I repeat just usability and user 
experience out of the box.

> 
>>>> And with the Virtual Engine it will become more interesting to have
>>>> this. So if we had a bug report saying load balancing is not working
>>>> well, we could just say "please run it via trace.pl --trace and attach
>>>> perf script output". That way we could easily see whether or not is is a
>>>> problem in userspace behaviour or else.
>>>
>>> And there I was wanting a script to capture the workload so that we
>>> could replay it and dissect it. :-p
>>
>> Depends on what level you want that. Perf script output from the above
>> tracepoints would do on one level. If you wanted a higher level to
>> re-exercise load balancing then it wouldn't completely be enough, or at
>> least a lot of guesswork would be needed.
> 
> It all depends on what level you want to optimise, is the way I look at
> it. Userspace driver, you capture the client->driver userspace API (e.g.
> cairo-trace, apitrace). But for optimising scheduling layout, we just
> need a workload descriptor like wsim -- with perhaps the only tweak
> being able to define latency/throughput metrics relevant to that
> workload, and being able to integrate with a pseudo display server. The
> challenge as I see it is being able to convince the user that it is a
> useful diagnosis step and being able to generate a reasonable wsim
> automatically.

To derive wsim's from apitraces sounds much more challenging but also I 
think is orthogonal. Tracing could be always there on the low level 
whether the client is real or simulated.

Regards,

Tvrtko


More information about the Intel-gfx mailing list