[PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

Tue May 28 16:10:51 UTC 2019

On 2019/05/28, Daniel Vetter wrote:
> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> <Christian.Koenig at amd.com> wrote:
> >
> > Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> > > [SNIP]
> > >> Might be a good idea looking into reverting it partially, so that at
> > >> least command submission and buffer allocation is still blocked.
> > > I thought the issue is a lot more than vainfo, it's pretty much every
> > > hacked up compositor under the sun getting this wrong one way or
> > > another. Thinking about this some more, I also have no idea how you'd
> > > want to deprecate rendering on primary nodes in general. Apparently
> > > that breaks -modesetting already, and probably lots more compositors.
> > > And it looks like we're finally achieve the goal kms set out to 10
> > > years ago, and new compositors are sprouting up all the time. I guess
> > > we could just break them all (on new hardware) and tell them to all
> > > suck it up. But I don't think that's a great option. And just
> > > deprecating this on amdgpu is going to be even harder, since then
> > > everywhere else it'll keep working, and it's just amdgpu.ko that looks
> > > broken.
> > >
> > > Aside: I'm not supporting Emil's idea here because it fixes any issues
> > > Intel has - Intel doesn't care. I support it because reality sucks,
> > > people get this render vs. primary vs. multi-gpu prime wrong all the
> > > time (that's also why we have hardcoded display+gpu pairs in mesa for
> > > the various soc combinations out there), and this looks like a
> > > pragmatic solution. It'd be nice if every compositor and everything
> > > else would perfectly support multi gpu and only use render nodes for
> > > rendering, and only primary nodes for display. But reality is that
> > > people hack on stuff until gears on screen and then move on to more
> > > interesting things (to them). So I don't think we'll ever win this :-/
> >
> > Yeah, but this is a classic case of working around user space issues by
> > making kernel changes instead of fixing user space.
> >
> > Having privileged (output control) and unprivileged (rendering control)
> > functionality behind the same node is a mistake we have made a long time
> > ago and render nodes finally seemed to be a way to fix that.
> >
> > I mean why are compositors using the primary node in the first place?
> > Because they want to have access to privileged resources I think and in
> > this case it is perfectly ok to do so.
> >
> > Now extending unprivileged access to the primary node actually sounds
> > like a step into the wrong direction to me.
> >
> > I rather think that we should go down the route of completely dropping
> > command submission and buffer allocation through the primary node for
> > non master clients. And then as next step at some point drop support for
> > authentication/flink.
> >
> > I mean we have done this with UMS as well and I don't see much other way
> > to move forward and get rid of those ancient interface in the long term.
> 
> Well kms had some really good benefits that drove quick adoption, like
> "suspend/resume actually has a chance of working" or "comes with
> buffer management so you can run multiple gears".
> 
> The render node thing is a lot more niche use case (prime, better priv
> separation), plus "it's cleaner design". And the "cleaner design" part
> is something that empirically doesn't seem to matter :-/ Just two
> examples:
> - KHR_display/leases just iterated display resources on the fd needed
> for rendering (and iirc there was even a patch to expose that for
> render nodes too so it works with DRI3), because implementing
> protocols is too hard. Barely managed to stop that one before it
> happened.
> - Various video players use the vblank ioctl on directly to schedule
> frames, without telling the compositor. I discovered that when I
> wanted to limite the vblank ioctl to master clients only. Again,
> apparently too hard to use the existing extensions, or fix the bugs in
> there, or whatever. One userspace got fixed last year, but it'll
> probably get copypasted around forever :-/
> 
> So I don't think we'll ever manage to roll a clean split out, and best
> we can do is give in and just hand userspace what it wants. As much as
> that's misguided and unclean and all that. Maybe it'll result in a
> least fewer stuff getting run as root to hack around this, because
> fixing properly seems not to be on the table.
> 
> The beauty of kms is that we've achieved the mission, everyone's
> writing their own thing. Which is also terrible, and I don't think
> it'll get better.

With the risk of coming rude I will repeat my earlier comment:

The problem is _neither_ Intel nor libva specific.

That said, let's step back for a moment and consider:

 - the "block everything but KMS via the primary node" idea is great but
orthogonal

 - the series does address issues that are vendor-agnostic

 - by default this series does _not_ cause any regression be that for
new or old userspace

 - there are two trivial solutions, if the AMD team has concerns about
closed-source/private stack depending on the old behaviour
If they want I can even write the patches ;-)

That said, the notable comments received so far are:
 - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from
handle. I'm OK but this will change the return code - from EACCES to
ENOSYS

 - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is
planning to drop nearly all DRM_AUTH instances in their driver.

Christian, as mentioned before - this series does _not_ add
functionality to render nodes. It effectively paves a way towards
removing DRM_AUTH.

I understand the series may feel a bit dirty. Yet I would gladly address
any technical concerns you have.

Thanks
Emil