[RFC 8/9] drm: track pending user-space actions

Wed Jul 24 07:24:05 PDT 2013

Hi

On Wed, Jul 24, 2013 at 4:03 PM, Maarten Lankhorst
<maarten.lankhorst at canonical.com> wrote:
> Op 24-07-13 15:43, David Herrmann schreef:
>> If we want to support real hotplugging, we need to block new syscalls from
>> entering any drm_* fops. We also need to wait for these to finish before
>> unregistering a device.
>>
>> This patch introduces drm_dev_get/put_active() which mark a device as
>> active during file-ops. If a device is unplugged, drm_dev_get_active()
>> will fail and prevent users from using this device.
>>
>> Internally, we use rwsems to implement this. It allows simultaneous users
>> (down_read) and we can block on them (down_write) to wait until they are
>> done. This way, a drm_dev_unregister() can be called at any time and does
>> not have to wait for the last drm_release() call.
>>
>> Note that the current "atomic_t unplugged" approach is not safe. Imagine
>> an unplugged device but a user-space context which already is beyong the
>> "drm_device_is_unplugged()" check. We have no way to prevent any following
>> mmap operation or buffer access. The percpu-rwsem avoids this by
>> protecting a whole file-op call and waiting with unplugging a device until
>> all pending calls are finished.
>>
>> FIXME: We still need to update all the driver's fops in case they don't
>> use the DRM core stubs. A quick look showed only custom mmap callbacks.
> Meh, I don't think it's possible to make the mmaps completely race free.

It is. See this scenario:

drm_dev_unregister() sets "plugged = true;" in write_lock(). This
guarantees, there is currently no other pending user in any file-ops
or mmap-fault handler (assuming the mmap fault handler is wrapped in
drm_dev_get_active(), drm_dev_put_active()).

After that, I call unmap_mapping_range() which resets all DRM-mapped
PTEs of user-space processes. So if a mmap is accessed afterwards, it
will trap and the fault() handler will fail with SIGBUS because
drm_dev_get_active() will return false. New mmaps cannot be created,
because drm_mmap() and alike will also be protected by
drm_dev_get_active().

I don't see any race here, do you?

> I 'solved' this by taking mapping->i_mmap_mutex, it reduces the window,
> because all the mmap callbacks are called while holding that lock, so at least
> new mappings from splitting mappings up cannot happen.

Is this somehow faster/better than wrapping fault callbacks in
get_active/put_active?

> Why did you make the percpu rwsem local to the device?
> It's only going to be taking in write mode during device destruction, which
> isn't exactly fast anyway. A single global rwsem wouldn't have any negative effect.

I have 4 UDL cards at home and I dislike that all cards are blocked
for a quite long time if I unplug just one card. As long as we have
drm_global_mutex with this broad use, we cannot fix it. But I thought,
I'd avoid introducing more global locks so maybe I can some day
improve that, too.

So I disagree, once we reduce drm_global_mutex, a single global rwsem
_will_ have a negative effect. Besides, if you have only 1 GPU, it
doesn't matter as you still only get a single rwsem.

>
> My original patch was locking mapping->i_mmap_mutex, then the rwsem in write mode, then the
> global mutex lock during unplug, to ensure that all code saw the unplugged correctly.
>
> That way using any of those 3 locks would be sufficient to check dev->unplugged safely.

If we want to fix the races completely, I don't think using
i_mmap_mutex helps us.

Regarding drm_global_mutex, as I said, my intention is to reduce this
lock to a minimum. The only place where we need it is minor-allocation
and driver-device-lists (ignoring the legacy users like drm-lock). For
both, we could easily use spinlocks. So it's the same reason as above:
I'd like to avoid contention on drm_global_mutex.
Besides, I dislike broad locks. It's not like we safe a lot of memory
here if we have one spinlock per driver instead of a global drm mutex.
And if we have small-ranging locks, it is much easier to review them
(at least to me).

Cheers
David