[PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

Wed Aug 28 19:46:51 UTC 2024

On 8/28/24 9:23 PM, Akhil P Oommen wrote:
> On Wed, Aug 28, 2024 at 06:46:37AM -0700, Rob Clark wrote:
>> On Wed, Aug 28, 2024 at 6:42 AM Rob Clark <robdclark at gmail.com> wrote:
>>>
>>> On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
>>> <antomani103 at gmail.com> wrote:
>>>>
>>>> On 8/27/24 11:07 PM, Rob Clark wrote:
>>>>> On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
>>>>> <antomani103 at gmail.com> wrote:
>>>>>>
>>>>>> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
>>>>>>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
>>>>>>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen <quic_akhilpo at quicinc.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
>>>>>>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen <quic_akhilpo at quicinc.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
>>>>>>>>>>>>> This patch implements preemption feature for A6xx targets, this allows
>>>>>>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
>>>>>>>>>>>>> hardware as such supports multiple levels of preemption granularities,
>>>>>>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine grained
>>>>>>>>>>>>> such as draw-call level or a bin boundary level preemption. This patch
>>>>>>>>>>>>> enables the basic preemption level, with more fine grained preemption
>>>>>>>>>>>>> support to follow.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Sharat Masetty <smasetty at codeaurora.org>
>>>>>>>>>>>>> Signed-off-by: Antonino Maniscalco <antomani103 at gmail.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> No postamble packets which resets perfcounters? It is necessary. Also, I
>>>>>>>>>>>> think we should disable preemption during profiling like we disable slumber.
>>>>>>>>>>>>
>>>>>>>>>>>> -Akhil.
>>>>>>>>>>>
>>>>>>>>>>> I don't see anything in kgsl which disables preemption during
>>>>>>>>>>> profiling. It disables resetting perfcounters when doing system-wide
>>>>>>>>>>> profiling, like freedreno, and in that case I assume preempting is
>>>>>>>>>>> fine because the system profiler has a complete view of everything and
>>>>>>>>>>> should "see" preemptions through the traces. For something like
>>>>>>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
>>>>>>>>>>> because we disable saving/restoring perf counters, but that has to
>>>>>>>>>>> happen in userspace because the kernel doesn't know what userspace
>>>>>>>>>>> does.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> KGSL does some sort of arbitration of perfcounter configurations and
>>>>>>>>>> adds the select/enablement reg configuration as part of dynamic
>>>>>>>>>> power up register list which we are not doing here. Is this something
>>>>>>>>>> you are taking care of from userspace via preamble?
>>>>>>>>>>
>>>>>>>>>> -Akhil
>>>>>>>>>
>>>>>>>>> I don't think we have to take care of that in userspace, because Mesa
>>>>>>>>> will always configure the counter registers before reading them in the
>>>>>>>>> same submission, and if it gets preempted in the meantime then we're
>>>>>>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets
>>>>>>>>> them from userspace, which is why it has to do something to set them
>>>>>>>>
>>>>>>>> Sorry, should be "kgsl sets them from the kernel".
>>>>>>>>
>>>>>>>>> after IFPC slumber or a context switch when the HW state is gone.
>>>>>>>>> Also, because the upstream approach doesn't play nicely with system
>>>>>>>>> profilers like perfetto, VK_KHR_performance_query is hidden by default
>>>>>>>>> behind a debug flag in turnip. So there's already an element of "this
>>>>>>>>> is unsupported, you have to know what you're doing to use it."
>>>>>>>
>>>>>>> But when you have composition on GPU enabled, there will be very frequent
>>>>>>> preemption. And I don't know how usable profiling tools will be in that
>>>>>>> case unless you disable preemption with a Mesa debug flag. But for that
>>>>>>> to work, all existing submitqueues should be destroyed and recreated.
>>>>>>>
>>>>>>> So I was thinking that we can use the sysprof propertry to force L0
>>>>>>> preemption from kernel.
>>>>>>>
>>>>>>> -Akhil.
>>>>>>>
>>>>>>
>>>>>> Right but when using a system profiler I imagined the expectation would
>>>>>> be to be able to understand how applications and compositor interact. An
>>>>>> use case could be measuring latency and understanding what contributes
>>>>>> to it. That is actually the main reason I added traces for preemption.
>>>>>> Disabling preemption would make it less useful for this type of
>>>>>> analysis. Did you have an use case in mind for a system profiler that
>>>>>> would benefit from disabling preemption and that is not covered by
>>>>>> VK_KHR_performance_query (or equivalent GL ext)?
> 
> Please consider this as a friendly suggestion based on Conner's clarification.
> Not a blocker. TBH, I don't have clairty on the profiling story in Mesa!
> 

Thanks, your input was appreciated :) I just wanted to make sure we 
where on the same page. So considering this, I will be able to send v2 soon.

>>>>>
>>>>> I would think that we want to generate an event, with GPU timestamp
>>>>> (ie. RB_DONE) and which ring we are switching to, so that perfetto/etc
>>>>> could display multiple GPU timelines and where the switch from one to
>>>>> the other happens.
>>>>>
>>>>> I'm a bit curious how this is handled on android, with AGI/etc.. I
>>>>> don't see any support in perfetto for this.
>>>>>
>>>>> BR,
>>>>> -R
>>>>>
>>>>>> Best regards,
>>>>>> --
>>>>>> Antonino Maniscalco <antomani103 at gmail.com>
>>>>>>
>>>>
>>>> Looking at KGSL they seem to use ftrace and I don't see it doing
>>>> anything to get a timestamp from some GPU timer, really not sure how
>>>> that would be put in a gpu timeline.
> 
> Yeah, we usually rely on ftraces which is good enough to measure preemption
> latency.
> 
> -Akhil.
> 

Thanks for confirming! The traces I added are pretty similar to KGSL's 
so it should be suitable for serving the same purpose.

>>>
>>> I suspect it would require some work on perfetto trace-processor.  It
>>> can ingest ftrace events (but those would end up being something
>>> driver specific).  Maybe with u_trace and some tracepoints in the
>>> 'ambles something could be done that would be more driver agnostic
>>> (but idk if that would work for gpu's where preemption happens more
>>> autonomously in the fw)
>>
>> btw how to handle tracing preemption probably shouldn't hold up
>> sending the next iteration of this series.  There isn't that much more
>> time to get this in v6.12, and I think better visualization of
>> preemption is going to take some work outside of the kernel.
>>
>> BR,
>> -R

Best regards,
-- 
Antonino Maniscalco <antomani103 at gmail.com>