[Intel-gfx] [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS

Thu Sep 24 03:55:23 PDT 2015

On 09/24/2015 11:31 AM, Chris Wilson wrote:
> On Thu, Sep 24, 2015 at 11:23:48AM +0100, Tvrtko Ursulin wrote:
>>
>> On 09/23/2015 09:07 PM, Chris Wilson wrote:
>>> If the client revokes the virtual address it asked to be mapped into GPU
>>> space via userptr (by using anything along the lines of mmap, mprotect,
>>> madvise, munmap, ftruncate etc) the mmu notifier sends a range
>>> invalidate command to userptr. Upon receiving the invalidation signal
>>> for the revoked range, we try to release the struct pages we pinned into
>>> the GTT. However, this can fail if any of the GPU's VMA are pinned for
>>> use by the hardware (i.e. despite the user's intention we cannot
>>> relinquish the client's address range and keep uptodate with whatever is
>>> placed in there). Currently we emit a few WARN so that we would notice
>>> if this every occurred in the wild; it has. Sadly this means we need to
>>> replace those WARNs with the proper SIGBUS to the offending clients
>>> instead.
>>
>> How does it happen? Frame buffer?
>
> Ignoring the issue of -EIO since patches to fix that path also haven't
> landed, the primary cause is through binding the userptr to a scanout
> (framebuffer). This is not recommended usage for userptr since the CPU
> view is then incoherent, but not impossible.
>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Cc: Michał Winiarski <michal.winiarski at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_userptr.c | 41 +++++++++++++++++++++++++++++----
>>>   1 file changed, 37 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> index f75d90118888..efb404b9fe69 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> @@ -81,11 +81,44 @@ static void __cancel_userptr__worker(struct work_struct *work)
>>
>> This line is a reminder the previous series still hasn't landed. I
>> think it was all r-b-ed, with only my request to not rely on
>> release_pages (or something) handle negative and zero page count.
>>
>>>   		was_interruptible = dev_priv->mm.interruptible;
>>>   		dev_priv->mm.interruptible = false;
>>>
>>> -		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
>>> -			int ret = i915_vma_unbind(vma);
>>> -			WARN_ON(ret && ret != -EIO);
>>> +		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
>>> +			i915_vma_unbind(vma);
>>> +		if (i915_gem_object_put_pages(obj)) {
>>> +			struct task_struct *p;
>>> +
>>> +			DRM_ERROR("Unable to revoke ownership by userptr of"
>>> +				  " invalidated address range, sending SIGBUS"
>>> +				  " to attached clients.\n");
>>> +
>>> +			rcu_read_lock();
>>> +			for_each_process(p) {
>>
>> I don't think this is safe this without holding the tasklist_lock.
>
> Hmm, it's the only lock taken in the oom-killer for sending the signal.
> The list will not change nor will tasks disappear whilst we hold the
> read-lock so it seems sane.

Then I'll say hmm as well. Since I've now seen there is both in use, 
with and without holding the tasklist_lock.

I thought that with just rcu_read_lock, nothing prevents another CPU 
from obtaining the write tasklist_lock and mess about with it. But maybe 
we are talking about some complex locking scheme here? I don't know. Did 
not find any documentation on the tasklist_lock..

Regards,

Tvrtko