[PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

Wed Apr 27 06:19:58 UTC 2022

Am 26.04.22 um 20:50 schrieb Chia-I Wu:
> On Tue, Apr 26, 2022 at 11:02 AM Christian König
> <christian.koenig at amd.com> wrote:
>> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
>>> [SNIP]
>>>>>> Well I just send a patch to completely remove the trace point.
>>>>>>
>>>>>> As I said it absolutely doesn't make sense to use this for
>>>>>> visualization, that's what the trace_dma_fence_init trace point is good for.
>>> I am a bit confused by this.  _emit and _signaled are a great way to
>>> see how many fences are pending from cpu's point of view.  How does
>>> _emit make no sense and _init is good instead?
>> We had exactly that confusion now multiple times and it's one of the
>> main reasons why I want to remove the _emit trace point.
>>
>> See the when you want to know how many fences are pending you need to
>> watch out for init/destroy and *NOT* emit.
>>
>> The reason is that in the special case where emit makes sense (e.g. the
>> GPU scheduler fences) emit comes later than init, but pending on the CPU
>> and taking up resources are all fences and not just the one emitted to
>> the hardware.
> I am more interested in pending on the GPU.
>
>> On the other hand when you want to measure how much time each operation
>> took on the hardware you need to take a look at the differences of the
>> signal events on each timeline.
> _signaled alone is not enough when the GPU is not always busy.  After
> the last pending fence signals but before the following _init/_emit,
> nothing is pending.

Yeah, I'm perfectly aware of that.

> For all drivers except virtio-gpu, _init and "ring head update" always
> happen close enough that I can see why _emit is redundant.  But I like
> having _emit as a generic tracepoint for timelines where _init and
> _emit can be apart, instead of requiring a special case tracepoint for
> each special case timeline.

And I'm certainly not going to add _emit to all drivers just because of 
that. As you said it is a special case for virtio-gpu and the GPU scheduler.

And as I explained before the difference between _init and _emit 
shouldn't matter to your visualization. The background is that as soon 
as an dma_fence is initialized with _init it is "live" regarding the 
dependency and memory management and exactly that's what matters for the 
visualization.

The latency between _init and _emit is just interesting for debugging 
the scheduler and surprisingly virtio-gpu as well, for all other use 
cases it is irrelevant.

Regards,
Christian.

>> So there isn't really any use case for the emit trace point, except when
>> you want to figure out how much latency the scheduler introduce. Then
>> you want to take a look at init and emit, but that isn't really that
>> interesting for performance analyses.
>>
>> Regards,
>> Christian.
>>