[Intel-gfx] [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Oct 8 02:45:47 PDT 2015
On 28/09/15 15:14, Daniel Vetter wrote:
> On Mon, Sep 28, 2015 at 02:52:30PM +0100, Chris Wilson wrote:
>> On Mon, Sep 28, 2015 at 03:42:22PM +0200, Daniel Vetter wrote:
>>> On Wed, Sep 23, 2015 at 09:07:24PM +0100, Chris Wilson wrote:
>>>> If the client revokes the virtual address it asked to be mapped into GPU
>>>> space via userptr (by using anything along the lines of mmap, mprotect,
>>>> madvise, munmap, ftruncate etc) the mmu notifier sends a range
>>>> invalidate command to userptr. Upon receiving the invalidation signal
>>>> for the revoked range, we try to release the struct pages we pinned into
>>>> the GTT. However, this can fail if any of the GPU's VMA are pinned for
>>>> use by the hardware (i.e. despite the user's intention we cannot
>>>> relinquish the client's address range and keep uptodate with whatever is
>>>> placed in there). Currently we emit a few WARN so that we would notice
>>>> if this every occurred in the wild; it has. Sadly this means we need to
>>>> replace those WARNs with the proper SIGBUS to the offending clients
>>>> instead.
>>>>
>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Cc: MichaĆ Winiarski <michal.winiarski at intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/i915_gem_userptr.c | 41 +++++++++++++++++++++++++++++----
>>>> 1 file changed, 37 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> index f75d90118888..efb404b9fe69 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> @@ -81,11 +81,44 @@ static void __cancel_userptr__worker(struct work_struct *work)
>>>> was_interruptible = dev_priv->mm.interruptible;
>>>> dev_priv->mm.interruptible = false;
>>>>
>>>> - list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
>>>> - int ret = i915_vma_unbind(vma);
>>>> - WARN_ON(ret && ret != -EIO);
>>>> + list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
>>>> + i915_vma_unbind(vma);
>>>> + if (i915_gem_object_put_pages(obj)) {
>>>> + struct task_struct *p;
>>>> +
>>>> + DRM_ERROR("Unable to revoke ownership by userptr of"
>>>> + " invalidated address range, sending SIGBUS"
>>>> + " to attached clients.\n");
>>>> +
>>>> + rcu_read_lock();
>>>> + for_each_process(p) {
>>>> + siginfo_t info;
>>>> +
>>>> + /* This doesn't capture everyone who has
>>>> + * the pages pinned behind a VMA as we
>>>> + * do not have that tracking information
>>>> + * available, it does however kill the
>>>> + * original process (and siblings) who
>>>> + * created the userptr and presumably tried
>>>> + * to reuse the address space despite having
>>>> + * pinned it (possibly indirectly) to the hw.
>>>> + * Arguably, we don't even want to kill the
>>>> + * other processes as they are not at fault,
>>>> + * likely to be a display server, and hopefully
>>>> + * will release the pages in due course once
>>>> + * the client is dead.
>>>> + */
>>>> + if (p->mm != obj->userptr.mm->mm)
>>>> + continue;
>>>> +
>>>> + info.si_signo = SIGBUS;
>>>> + info.si_errno = 0;
>>>> + info.si_code = BUS_ADRERR;
>>>> + info.si_addr = (void __user *)obj->userptr.ptr;
>>>> + force_sig_info(SIGBUS, &info, p);
>>>> + }
>>>> + rcu_read_unlock();
>>>
>>> Why do we need to send a SIGBUS? It won't tear down the offending gem bo,
>>> any new users will hopefully get it, and abusing SIGBUS without the thread
>>> actually doing a memory access is a bit surprising. DRM_DEBUG seems to be
>>> the most we can do here I think - I think userspace being able to do this
>>> is just a fundamental property of userptr.
>>
>> It is not the bo that is at fault but the *client's* *address* *space*
>> that is being changed. It is equivalent to mmap on a truncated file i.e.
>> if the client tries to use its mmapping after it has truncated the file
>> it is scolded via SIGBUS.
>
> But existing SIGBUS is thread-bound and comes with the fault address
> attached. This is just the gpu being a bit unhappy, so the SIGBUS comes
> out of complete nowhere to smack the userspace thread. Any kind of SIGBUS
> catcher userspace has for other reasons might be supremely surprised by
> this and do stupid things. Hence I don't think throwing SIGBUS here is
> correct behaviour. And there doesn't seem to be anything else suitable
> really.
Te offending address is provided with the signal as far as I can see.
I think it is fine to do this, even required since the alternative is
for GPU to keep using random memory indefinitely and userspace never
gets to know.
And I don't see any reason to keep the process running who did such an
elementary and serious mistake.
Is the only concern that the process can catch it and not exit?
I am just not sure about the locking requirement for for_each_process
since existing call sites give conflicting examples. I don't see how
turning the preemption off can be safe without the tasklist lock but
perhaps I am wrong, don't know.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list