[PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

Tue May 28 07:38:28 UTC 2019

On Tue, May 28, 2019 at 8:58 AM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 27.05.19 um 17:26 schrieb Daniel Vetter:
> > On Mon, May 27, 2019 at 3:42 PM Koenig, Christian
> > <Christian.Koenig at amd.com> wrote:
> >> Am 27.05.19 um 15:26 schrieb Emil Velikov:
> >>> On 2019/05/27, Daniel Vetter wrote:
> >>>> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote:
> >>>>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> >>>>>> From: Emil Velikov <emil.velikov at collabora.com>
> >>>>>>
> >>>>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> >>>>>> render node. A seemingly deliberate design decision.
> >>>>>>
> >>>>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> >>>>>> yet not all userspace checks if it's authenticated, but instead uses
> >>>>>> uncommon assumptions.
> >>>>>>
> >>>>>> After days of digging through git log and testing, only a single (ab)use
> >>>>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> >>>>>> assuming that failure implies lack of authentication.
> >>>>>>
> >>>>>> Affected versions are:
> >>>>>>     - the whole 18.2.x series, which is EOL
> >>>>>>     - the whole 18.3.x series, which is EOL
> >>>>>>     - the 19.0.x series, prior to 19.0.4
> >>>>> Well you could have saved your time, cause this is still a NAK.
> >>>>>
> >>>>> For the record: I strongly think that we don't want to expose any render
> >>>>> functionality on the primary node.
> >>>>>
> >>>>> To even go a step further I would say that at least on AMD hardware we
> >>>>> should completely disable DRI2 for one of the next generations.
> >>>>>
> >>>>> As a follow up I would then completely disallow the DRM authentication
> >>>>> for amdgpu, so that the command submission interface on the primary node
> >>>>> can only be used by the display server.
> >>>> So amdgpu is running in one direction, while everyone else is running in
> >>>> the other direction? Please note that your patch to change i915 landed
> >>>> already, so that ship is sailing (but we could ofc revert that back
> >>>> again).
> >>>>
> >>>> Imo really not a great idea if we do a amdgpu vs. everyone else split
> >>>> here. If we want to deprecate dri2/flink/rendering on primary nodes across
> >>>> the stack, then that should be a stack wide decision, not an amdgpu one.
> >>>>
> >>> Cannot agree more - I would love to see drivers stay consistent.
> >> Yeah, completely agree to that. That's why I think we should not do this
> >> at all and just let Intel fix it's userspace bugs :P
> > So you're planning to submit that revert? You did jump the gun with
> > sending out that patch after all too ... (aside from it got merged
> > because of some other mixup with r-b tags and what not).
>
> Well already regretting submitting that. On the other hand what is the
> minimum IOCTLs we need to get working to fix the vainfo issue?

We have a bit more time, it's only going to be in 5.3. But yeah if we
don't bottom out on any of the options here I think revert it has to
be.

> Might be a good idea looking into reverting it partially, so that at
> least command submission and buffer allocation is still blocked.

I thought the issue is a lot more than vainfo, it's pretty much every
hacked up compositor under the sun getting this wrong one way or
another. Thinking about this some more, I also have no idea how you'd
want to deprecate rendering on primary nodes in general. Apparently
that breaks -modesetting already, and probably lots more compositors.
And it looks like we're finally achieve the goal kms set out to 10
years ago, and new compositors are sprouting up all the time. I guess
we could just break them all (on new hardware) and tell them to all
suck it up. But I don't think that's a great option. And just
deprecating this on amdgpu is going to be even harder, since then
everywhere else it'll keep working, and it's just amdgpu.ko that looks
broken.

Aside: I'm not supporting Emil's idea here because it fixes any issues
Intel has - Intel doesn't care. I support it because reality sucks,
people get this render vs. primary vs. multi-gpu prime wrong all the
time (that's also why we have hardcoded display+gpu pairs in mesa for
the various soc combinations out there), and this looks like a
pragmatic solution. It'd be nice if every compositor and everything
else would perfectly support multi gpu and only use render nodes for
rendering, and only primary nodes for display. But reality is that
people hack on stuff until gears on screen and then move on to more
interesting things (to them). So I don't think we'll ever win this :-/
-Daniel

> Christian.
>
> > -Daniel
> >
> >> Anyway my concern is really that we should stop extending functionality
> >> on the primary node.
> >>
> >> E.g. the render node is for use by the clients and the primary node for
> >> mode setting and use by the display server only.
> >>
> >> Regards,
> >> Christian.
> >>
> >>> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)
> >>>
> >>> Thanks
> >>> Emil
> >
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch