[PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

Felix Kuehling felix.kuehling at amd.com
Fri Nov 29 20:10:08 UTC 2019


On 2019-10-11 1:12 p.m., tj at kernel.org wrote:
> Hello, Daniel.
>
> On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
>> That's not the point I was making. For cpu cgroups there's a very well
>> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
>> bitmasks you use in various system calls (they match). And that stuff
>> works across vendors.
> Please note that there are a lot of limitations even to cpuset.
> Affinity is easy to implement and seems attractive in terms of
> absolute isolation but it's inherently cumbersome and limited in
> granularity and can lead to surprising failure modes where contention
> on one cpu can't be resolved by the load balancer and leads to system
> wide slowdowns / stalls caused by the dependency chain anchored at the
> affinity limited tasks.
>
> Maybe this is a less of a problem for gpu workloads but in general the
> more constraints are put on scheduling, the more likely is the system
> to develop twisted dependency chains while other parts of the system
> are sitting idle.
>
> How does scheduling currently work when there are competing gpu
> workloads?  There gotta be some fairness provision whether that's unit
> allocation based or time slicing, right?

The scheduling of competing workloads on GPUs is handled in hardware and 
firmware. The Linux kernel and driver are not really involved. We have 
some knobs we can tweak in the driver (queue and pipe priorities, 
resource reservations for certain types of workloads), but they are 
pretty HW-specific and I wouldn't make any claims about fairness.

Regards,
   Felix

>    If that's the case, it might
> be best to implement proportional control on top of that.
> Work-conserving mechanisms are the most versatile, easiest to use and
> least likely to cause regressions.
>
> Thanks.
>


More information about the dri-devel mailing list