[PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

Tue Apr 14 15:01:12 UTC 2020

On Tue, Apr 14, 2020 at 4:29 PM Kenny Ho <y2kenny at gmail.com> wrote:
>
> On Tue, Apr 14, 2020 at 10:04 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> >
> > This has _nothing_ to do with Intel (I think over the past 25 years or
> > so intel has implemented all 4 versions of gpu splitting that I
> > listed, but not entirely sure).
> >
> > So again pls less tribal fighting, more collaboration. If you can't do
> > that, let's pick nouveau/nvidia as arbitrary neutral ground.
>
> So are you saying Intel has implemented a form of masking before?  I
> don't think we need to just pick a vendor as a neutral ground.  The
> idea of spatial sharing vs time sharing is not vendor specific... it's
> not even GPU specific.  This is why I asked the two questions below.
>
> > > Perhaps the following questions can help keep the discussion technical:
> > > 1)  Is it possible to implement non-work-conserving distribution of
> > > GPU without spatial sharing?  (If yes, I'd love to hear a suggestion,
> > > if not...question 2.)
> > > 2)  If spatial sharing is required to support GPU HPC use cases, what
> > > would you implement if you have the hardware support today?
> >
> > The thing we can currently do in upstream (from how I'm understanding
> > hw) is assign entire PCI devices to containers, so essentially only
> > the entire /dev/dri/* cdev. That works, and it works across all
> > drivers we have in upstream right now.
> >
> > Anything more fine-grained I don't think is currently possible,
> > because everyone has a different idea of how to split up gpus. It
> > would be nice to have it, but in upstream, cross-vendor, I'm just not
> > seeing it happen right now.
>
> I understand the reality, but what would you implement to support the
> concept (GPU in HPC, which you said you are not against) if you have
> the hw support today?  How would you support low-jitter/low-latency
> sharing of a single GPU if you have whatever hardware support you need
> today?

Whatever works on my gpu.

But there's a huge difference between what I can do for Intel, with my
Intel hat on, and ship that on some random intel-only repo or DKMS.
And what makes sense to push to upstream, because on upstream it needs
to be cross vendor and have reasonably clear semantics so that admins
understand it no matter whether you plug in an amd, nvidia or whatever
else gpu.

Yes this sucks, but as long as all the hw vendors insist on
differentiating here there's not much we can do. Maybe in the future
the VF stuff might help, but I'm not super hopeful that's actually
going to happen all that well. And the VF stuff at least works the
same way as what we currently can do already, with assigning an entire
/dev/dri/render* node to a container.

If you want more fine-grained then you (as a user) need to have
containers for amd, and different container isolation for nvidia, and
different container isolation for intel, and different container
isolation for $next_vendor, and so on. We can't just wish that there's
a standard way to manage this when there isn't. And merging
non-standard ways to split up gpus with cgroups, one for each gpu
vendor (generation maybe even?) just isn't going to work in upstream.

And really that's not a huge deal, because on the userspace side for
HPC it's the exact same sorry state of affairs, with cuda, rocm and
the oneapie effort from intel (not counting a bunch of things various
vendors tried to pull off on the soc side of things, there's even more
fun there). Standardizing the kernel management while you still need
to have different container images (these userspace generally have a
really hard time co-existing) isn't solving any real-world user
problems.

So yeah it sucks if you're a gpu compute user in some kind of server
setting :-/ And there's not really much I can do to fix this, except
tell vendors that everyone doing their own thing wont work (in
upstream, it'll work totally in all the vendor driver trees and
stacks, can't stop that).
-Daniel

> Regards,
> Kenny
>
>
> > > On Tue, Apr 14, 2020 at 9:26 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> > > >
> > > > On Tue, Apr 14, 2020 at 3:14 PM Kenny Ho <y2kenny at gmail.com> wrote:
> > > > >
> > > > > Ok.  I was hoping you can clarify the contradiction between the
> > > > > existance of the spec below and your "not something any other gpu can
> > > > > reasonably support" statement.  I mean, OneAPI is Intel's spec and
> > > > > doesn't that at least make SubDevice support "reasonable" for one more
> > > > > vendor?
> > > > >
> > > > > Partisanship aside, as a drm co-maintainer, do you really not see the
> > > > > need for non-work-conserving way of distributing GPU as a resource?
> > > > > You recognized the latencies involved (although that's really just
> > > > > part of the story... time sharing is never going to be good enough
> > > > > even if your switching cost is zero.)  As a drm co-maintainer, are you
> > > > > suggesting GPU has no place in the HPC use case?
> > > >
> > > >  So I did chat with people and my understanding for how this subdevice
> > > > stuff works is roughly, from least to most fine grained support:
> > > > - Not possible at all, hw doesn't have any such support
> > > > - The hw is actually not a single gpu, but a bunch of chips behind a
> > > > magic bridge/interconnect, and there's a scheduler load-balancing
> > > > stuff and you can't actually run on all "cores" in parallel with one
> > > > compute/3d job. So subdevices just give you some of these cores, but
> > > > from client api pov they're exactly as powerful as the full device. So
> > > > this kinda works like assigning an entire NUMA node, including all the
> > > > cpu cores and memory bandwidth and everything.
> > > > - Hw has multiple "engines" which share resources (like compute cores
> > > > or whatever) behind the scenes. There's no control over how this
> > > > sharing works really, and whether you have guarantees about minimal
> > > > execution resources or not. This kinda works like hyperthreading.
> > > > - Then finally we have the CU mask thing amdgpu has. Which works like
> > > > what you're proposing, works on amd.
> > > >
> > > > So this isn't something that I think we should standardize in a
> > > > resource management framework like cgroups. Because it's a complete
> > > > mess. Note that _all_ the above things (including the "no subdevices"
> > > > one) are valid implementations of "subdevices" in the various specs.
> > > >
> > > > Now on your question on "why was this added to various standards?"
> > > > because opencl has that too (and the rocm thing, and everything else
> > > > it seems). What I heard is that a few people pushed really hard, and
> > > > no one objected hard enough (because not having subdevices is a
> > > > standards compliant implementation), so that's why it happened. Just
> > > > because it's in various standards doesn't mean that a) it's actually
> > > > standardized in a useful fashion and b) something we should just
> > > > blindly adopt.
> > > >
> > > > Also like where exactly did you understand that I'm against gpus in
> > > > HPC uses cases. Approaching this in a slightly less tribal way would
> > > > really, really help to get something landed (which I'd like to see
> > > > happen, personally). Always spinning this as an Intel vs AMD thing
> > > > like you do here with every reply really doesn't help moving this in.
> > > >
> > > > So yeah stricter isolation is something customers want, it's just not
> > > > something we can really give out right now at a level below the
> > > > device.
> > > > -Daniel
> > > >
> > > > >
> > > > > Regards,
> > > > > Kenny
> > > > >
> > > > > On Tue, Apr 14, 2020 at 8:52 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> > > > > >
> > > > > > On Tue, Apr 14, 2020 at 2:47 PM Kenny Ho <y2kenny at gmail.com> wrote:
> > > > > > > On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> > > > > > > > My understanding from talking with a few other folks is that
> > > > > > > > the cpumask-style CU-weight thing is not something any other gpu can
> > > > > > > > reasonably support (and we have about 6+ of those in-tree)
> > > > > > >
> > > > > > > How does Intel plan to support the SubDevice API as described in your
> > > > > > > own spec here:
> > > > > > > https://spec.oneapi.com/versions/0.7/oneL0/core/INTRO.html#subdevice-support
> > > > > >
> > > > > > I can't talk about whether future products might or might not support
> > > > > > stuff and in what form exactly they might support stuff or not support
> > > > > > stuff. Or why exactly that's even in the spec there or not.
> > > > > >
> > > > > > Geez
> > > > > > -Daniel
> > > > > > --
> > > > > > Daniel Vetter
> > > > > > Software Engineer, Intel Corporation
> > > > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > > >
> > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch