[PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

Emil Velikov emil.l.velikov at gmail.com
Tue May 28 16:46:10 UTC 2019


On 2019/05/28, Koenig, Christian wrote:
> Am 28.05.19 um 18:10 schrieb Emil Velikov:
> > On 2019/05/28, Daniel Vetter wrote:
> >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> >> <Christian.Koenig at amd.com> wrote:
> >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> >>>> [SNIP]
> >>>>> Might be a good idea looking into reverting it partially, so that at
> >>>>> least command submission and buffer allocation is still blocked.
> >>>> I thought the issue is a lot more than vainfo, it's pretty much every
> >>>> hacked up compositor under the sun getting this wrong one way or
> >>>> another. Thinking about this some more, I also have no idea how you'd
> >>>> want to deprecate rendering on primary nodes in general. Apparently
> >>>> that breaks -modesetting already, and probably lots more compositors.
> >>>> And it looks like we're finally achieve the goal kms set out to 10
> >>>> years ago, and new compositors are sprouting up all the time. I guess
> >>>> we could just break them all (on new hardware) and tell them to all
> >>>> suck it up. But I don't think that's a great option. And just
> >>>> deprecating this on amdgpu is going to be even harder, since then
> >>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks
> >>>> broken.
> >>>>
> >>>> Aside: I'm not supporting Emil's idea here because it fixes any issues
> >>>> Intel has - Intel doesn't care. I support it because reality sucks,
> >>>> people get this render vs. primary vs. multi-gpu prime wrong all the
> >>>> time (that's also why we have hardcoded display+gpu pairs in mesa for
> >>>> the various soc combinations out there), and this looks like a
> >>>> pragmatic solution. It'd be nice if every compositor and everything
> >>>> else would perfectly support multi gpu and only use render nodes for
> >>>> rendering, and only primary nodes for display. But reality is that
> >>>> people hack on stuff until gears on screen and then move on to more
> >>>> interesting things (to them). So I don't think we'll ever win this :-/
> >>> Yeah, but this is a classic case of working around user space issues by
> >>> making kernel changes instead of fixing user space.
> >>>
> >>> Having privileged (output control) and unprivileged (rendering control)
> >>> functionality behind the same node is a mistake we have made a long time
> >>> ago and render nodes finally seemed to be a way to fix that.
> >>>
> >>> I mean why are compositors using the primary node in the first place?
> >>> Because they want to have access to privileged resources I think and in
> >>> this case it is perfectly ok to do so.
> >>>
> >>> Now extending unprivileged access to the primary node actually sounds
> >>> like a step into the wrong direction to me.
> >>>
> >>> I rather think that we should go down the route of completely dropping
> >>> command submission and buffer allocation through the primary node for
> >>> non master clients. And then as next step at some point drop support for
> >>> authentication/flink.
> >>>
> >>> I mean we have done this with UMS as well and I don't see much other way
> >>> to move forward and get rid of those ancient interface in the long term.
> >> Well kms had some really good benefits that drove quick adoption, like
> >> "suspend/resume actually has a chance of working" or "comes with
> >> buffer management so you can run multiple gears".
> >>
> >> The render node thing is a lot more niche use case (prime, better priv
> >> separation), plus "it's cleaner design". And the "cleaner design" part
> >> is something that empirically doesn't seem to matter :-/ Just two
> >> examples:
> >> - KHR_display/leases just iterated display resources on the fd needed
> >> for rendering (and iirc there was even a patch to expose that for
> >> render nodes too so it works with DRI3), because implementing
> >> protocols is too hard. Barely managed to stop that one before it
> >> happened.
> >> - Various video players use the vblank ioctl on directly to schedule
> >> frames, without telling the compositor. I discovered that when I
> >> wanted to limite the vblank ioctl to master clients only. Again,
> >> apparently too hard to use the existing extensions, or fix the bugs in
> >> there, or whatever. One userspace got fixed last year, but it'll
> >> probably get copypasted around forever :-/
> >>
> >> So I don't think we'll ever manage to roll a clean split out, and best
> >> we can do is give in and just hand userspace what it wants. As much as
> >> that's misguided and unclean and all that. Maybe it'll result in a
> >> least fewer stuff getting run as root to hack around this, because
> >> fixing properly seems not to be on the table.
> >>
> >> The beauty of kms is that we've achieved the mission, everyone's
> >> writing their own thing. Which is also terrible, and I don't think
> >> it'll get better.
> >
> > With the risk of coming rude I will repeat my earlier comment:
> >
> > The problem is _neither_ Intel nor libva specific.
> >
> >
> >
> > That said, let's step back for a moment and consider:
> >
> >   - the "block everything but KMS via the primary node" idea is great but
> > orthogonal
> >
> >   - the series does address issues that are vendor-agnostic
> >
> >   - by default this series does _not_ cause any regression be that for
> > new or old userspace
> >
> >   - there are two trivial solutions, if the AMD team has concerns about
> > closed-source/private stack depending on the old behaviour
> > If they want I can even write the patches ;-)
> >
> >
> > That said, the notable comments received so far are:
> >   - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from
> > handle. I'm OK but this will change the return code - from EACCES to
> > ENOSYS
> >
> >   - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is
> > planning to drop nearly all DRM_AUTH instances in their driver.
> >
> >
> > Christian, as mentioned before - this series does _not_ add
> > functionality to render nodes. It effectively paves a way towards
> > removing DRM_AUTH.
> 
> But it adds functionality to the primary node.
> 
Behaviour is adjusted - functionality was there since day 1.

> > I understand the series may feel a bit dirty. Yet I would gladly address
> > any technical concerns you have.
> 
> Well putting compatibility issues aside my concern is that this is 
> simply a bad design decision which we can't revert later on.
> 
As sad above - any concerns (theoretical or actual regressions) can be
trivially fixed _without_ reverting any of this.

I am more than happy to step up and address any regressions in timely
manner.


As a reminder without this series, some of your customers are forced to
run their applications as root.


Thanks
Emil


More information about the amd-gfx mailing list