[PATCH v2] drm: enable render-nodes by default

Thu Mar 20 14:13:40 PDT 2014

On Thu, Mar 20, 2014 at 4:54 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 03/20/2014 06:34 PM, Rob Clark wrote:
>> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>>> Hi
>>>>
>>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>>> <thellstrom at vmware.com> wrote:
>>>>> A user logs in to a system where DRI clients use render nodes. The
>>>>> system grants rw permission on the render nodes for the console user.
>>>>> User starts editing a secret document, starts some GPGPU structural FEM
>>>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>>>
>>>>> A malicious user logs in using fast user switching and becomes the owner
>>>>> of the render node. Tries to map a couple of random offsets, but that
>>>>> fails, due to security checks. Now crafts a malicious command stream to
>>>>> dump all GPU memory to a file. Steals the first user's secret document
>>>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>>>
>>>>> Now if we require drivers to block these malicious command streams this
>>>>> can never happen, and distros can reliably grant rw access to the render
>>>>> nodes to the user currently logged into the console.
>>>>>
>>>>> I guest basically what I'm trying to say that with the legacy concept it
>>>>> was OK to access all GPU memory, because an authenticated X user
>>>>> basically had the same permissions.
>>>>>
>>>>> With render nodes we're allowing multiple users into the GPU at the same
>>>>> time, and it's not OK anymore for a client to access another clients GPU
>>>>> buffer through a malicious command stream.
>>>> Yes, I understand the attack scenario, but that's not related to
>>>> render-nodes at all. The exact same races exist on the legacy node:
>>> I was under the impression that render nodes were designed to fix these
>>> issues?
>>>
>>>> 1) If you can do fast-user switching, you can spawn your own X-server,
>>>> get authenticated on your own server and you are allowed into the GPU.
>>>> You cannot map other user's buffers because they're on a different
>>>> master-object, but you _can_ craft malicious GPU streams and access
>>>> the other user's buffer.
>>> But with legacy nodes, drivers can (and should IMO) throw out all data
>>> from GPU memory on master drop,
>>> and then block dropped master authenticated clients from GPU, until
>>> their master becomes active again or dies (in which case they are
>>> killed). In line with a previous discussion we had. Now you can't do
>>> this with render nodes, so yes they do open up
>>> a new race that requires command stream validation.
>>>
>>>> 2) If you can do fast-user switching, switch to an empty VT, open the
>>>> legacy node and you automatically become DRM-Master because there is
>>>> no active master. Now you can do anything on the DRM node, including
>>>> crafting malicious GPU streams.
>>> I believe the above solution should work for this case as well.
>>>
>>>> Given that the legacy node is always around and _always_ has these
>>>> races, why should we prevent render-nodes from appearing just because
>>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>>> new race, as you assumed, then yes, we should avoid it. But at least
>>>> for all drivers supporting render-nodes so far, they either are
>>>> entirely safe or the just described races exist on both nodes.
>>> My suggestion is actually not to prevent render nodes from appearing,
>>> but rather that we should restrict them to drivers with command stream
>>> verification and / or per process virtual memory, and I also think we
>>> should plug the above races on legacy nodes. That way legacy nodes would
>>> use the old "master owns it all" model, while render nodes could allow
>>> multiple users at the same time.
>>>
>> hmm, if you only have global gpu virtual memory (rather than
>> per-process), it would still be kinda nice to support render nodes if
>> app had some way to figure out whether or not it's gpu buffers were
>> secure.
>
> If there is only global GPU memory there's of course also the option of
> providing a
> command stream verifier.

well, that wouldn't really help separate buffers from other contexts
(since a3xx and later has various load/store instructions)..

At the moment, I have two cases:

1) MMU... gpu has no direct access to system memory, so other than
access to other contexts buffers, the system is secure
2) no-MMU... vram carveout and set some registers to limit access to
addresses within that range

In case #1 we could implement per-context vm.. just a matter of
writing some code.  Doing it the naive way requires draining the
command queue on context switches and getting the CPU involved, which
isn't so terribly awesome.

The downstream android driver does context switches on the CP itself
(ie. bang some registers to point the MMU at new set of tables and TLB
flush)..  but it needs to be confirmed that this can be done securely
(ie. restricted to ringbuffer controlled by kernel and not IB buffer
from userspace).  If it isn't restricted to kernel ringbuffer, then
that sort of opens up an even bigger hole than it closes ;-)

>> Ie. an app that was using the gpu for something secure could
>> simply choose not to if the hw/driver could not guarantee that another
>> process using the gpu could not get access to the buffers..
>
> IMO that should work fine, but we need to provide a way for user-space
> to determine whether
> the render node is secure or not. Perhaps a sysfs attribute and / or a
> drm getparam() parameter?

I'd *assume* that that sort of thing would just be some sort of CL extension?

But no objection to exposing it in a more common way.

I guess it is also an option to keep the bootarg to override default
(with the default being 'enabled' for hw w/ per-context/process vm and
'disabled' otherwise).

BR,
-R

> /Thomas
>
>
>>
>> BR,
>> -R
>>
>>> /Thomas
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695