Add support for high priority scheduling in amdgpu

Wed Mar 1 11:42:20 UTC 2017

Patches #1-#14 are Acked-by: Christian König <christian.koenig at amd.com>.

Patch #15:

Not sure if that is a good idea or not, need to take a closer look after 
digging through the rest.

In general the HW IP is just for the IOCTL API and not for internal use 
inside the driver.

Patch #16:

Really nice :) I don't have time to look into it in detail, but you have 
one misconception I like to point out:
> The queue manager maintains a per-file descriptor map of user ring ids
> to amdgpu_ring pointers. Once a map is created it is permanent (this is
> required to maintain FIFO execution guarantees for a ring).
Actually we don't have a FIFO execution guarantee per ring. We only have 
that per context.

E.g. commands from different context can execute at the same time and 
out of order.

Making this per file is ok for now, but you should keep in mind that we 
might want to change that sooner or later.

Patch #17 & #18 need to take a closer look when I have more time, but 
the comments from others sounded valid to me as well.

Patch #19: Raising and lowering the priority of a ring during command 
submission doesn't sound like a good idea to me.

The way you currently have it implemented would also raise the priority 
of already running jobs on the same ring. Keep in mind that everything 
is pipelined here.

Additional to that you can't have a fence callback in the job structure, 
cause the job structure is freed by the same fence as well. So it can 
happen that you access freed up memory (but only for a very short period 
of time).

Patches #20-#22 are Acked-by: Christian König <christian.koenig at amd.com>.

Regards,
Christian.

Am 28.02.2017 um 23:14 schrieb Andres Rodriguez:
> This patch series introduces a mechanism that allows users with sufficient
> privileges to categorize their work as "high priority". A userspace app can
> create a high priority amdgpu context, where any work submitted to this context
> will receive preferential treatment over any other work.
>
> High priority contexts will be scheduled ahead of other contexts by the sw gpu
> scheduler. This functionality is generic for all HW blocks.
>
> Optionally, a ring can implement a set_priority() function that allows
> programming HW specific features to elevate a ring's priority.
>
> This patch series implements set_priority() for gfx8 compute rings. It takes
> advantage of SPI scheduling and CU reservation to provide improved frame
> latencies for high priority contexts.
>
> For compute + compute scenarios we get near perfect scheduling latency. E.g.
> one high priority ComputeParticles + one low priority ComputeParticles:
>      - High priority ComputeParticles: 2.0-2.6 ms/frame
>      - Regular ComputeParticles: 35.2-68.5 ms/frame
>
> For compute + gfx scenarios the high priority compute application does
> experience some latency variance. However, the variance has smaller bounds and
> a smalled deviation then without high priority scheduling.
>
> Following is a graph of the frame time experienced by a high priority compute
> app in 4 different scenarios to exemplify the compute + gfx latency variance:
>      - ComputeParticles: this scenario invloves running the compute particles
>        sample on its own.
>      - +SSAO: Previous scenario with the addition of running the ssao sample
>        application that clogs the GFX ring with constant work.
>      - +SPI Priority: Previous scenario with the addition of SPI priority
>        programming for compute rings.
>      - +CU Reserve: Previous scenario with the addition of dynamic CU
>        reservation for compute rings.
>
> Graph link:
> https://plot.ly/~lostgoat/9/
>
> As seen above, high priority contexts for compute allow us to schedule work
> with enhanced confidence of completion latency under high GPU loads. This
> property will be important for VR reprojection workloads.
>
> Note: The first part of this series is a resend of "Change queue/pipe split
> between amdkfd and amdgpu" with the following changes:
>      - Fixed kfdtest on Kaveri due to shift overflow. Refer to: "drm/amdkfdallow
>        split HQD on per-queue granularity v3"
>      - Used Felix's suggestions for a simplified HQD programming sequence
>      - Added a workaround for a Tonga HW bug during HQD programming
>
> This series is also available at:
> https://github.com/lostgoat/linux/tree/wip-high-priority
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx