[PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

Tue Dec 12 01:52:11 UTC 2023

On Fri, Dec 8, 2023 at 1:37 PM Alex Deucher <alexdeucher at gmail.com> wrote:

> On Fri, Dec 8, 2023 at 12:27 PM Joshua Ashton <joshua at froggi.es> wrote:
> >
> > FWIW, we are shipping this right now in SteamOS Preview channel
> > (probably going to Stable soon) and it seems to be working as expected
> > and fixing issues there in instances we need to composite, compositor
> > work we are forced to do would take longer than the compositor redzone
> > to vblank.
> >
> > Previously in high gfx workloads like Cyberpunk using 100% of the GPU,
> > we would consistently miss the deadline as composition could take
> > anywhere from 2-6ms fairly randomly.
> >
> > Now it seems the time for the compositor's work to complete is pretty
> > consistent and well in-time in gpuvis for every frame.
>
> I was mostly just trying to look up the information to verify that it
> was set up correctly, but I guess Marek already did and provided you
> with that info, so it's probably fine as is.
>
> >
> > The only times we are not meeting deadline now is when there is an
> > application using very little GPU and finishes incredibly quick, and the
> > compositor is doing significantly more work (eg. FSR from 800p -> 4K or
> > whatever), but that's a separate problem that can likely be solved by
> > inlining some of the composition work with the client's dmabuf work if
> > it has focus to avoid those clock bubbles.
> >
> > I heard some musings about dmabuf deadline kernel work recently, but not
> > sure if any of that is applicable to AMD.
>
> I think something like a workload hint would be more useful.  We did a
> few patch sets to allow userspace to provide a hint to the kernel
> about the workload type so the kernel could adjust the power
> management heuristics accordingly, but there were concerns that the
> UMDs would have to maintain application lists to select which
> heuristic worked best for each application.  Maybe it would be better
> to provide a general classification?  E.g., if the GL or vulkan app
> uses these extensions, it's probably a compute type application vs
> something more graphics-y.  The usual trade-off between power and
> performance.  In general, just letting the firmware pick the clock
> based on perf counters generally seems to work the best.  Maybe a
> general workload hint set by the compositor based on the content type
> it's displaying would be a better option (video vs gaming vs desktop)?
>
> The deadline stuff doesn't really align well with what we can do with
> our firmware and seems ripe for abuse.  Apps can just ask for high
> clocks all the time which is great for performance, but not great for
> power.  Plus there is not much room for anything other than max clocks
> since you don't know how big the workload is or which clocks are the
> limiting factor.
>

Max clocks also decrease performance due to thermal and power limits.
You'll get more performance and less heat if you let the GPU turn off idle
blocks and boost clocks for busy blocks.

Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20231211/48cf2e2b/attachment.htm>