[Intel-gfx] [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS

Mon Oct 12 02:06:23 PDT 2015

On 09/10/15 18:26, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 07:14:02PM +0200, Daniel Vetter wrote:
>> On Fri, Oct 09, 2015 at 10:03:14AM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 09/10/15 09:55, Daniel Vetter wrote:
>>>> On Fri, Oct 09, 2015 at 09:40:53AM +0100, Chris Wilson wrote:
>>>>> On Fri, Oct 09, 2015 at 09:48:01AM +0200, Daniel Vetter wrote:
>>>>>> On Thu, Oct 08, 2015 at 10:45:47AM +0100, Tvrtko Ursulin wrote:
>>>>>> The concern is that this isn't how SIG_SEGV works, it's a signal the
>>>>>> thread who made the invalid access gets directly. You never get a SIG_SEGV
>>>>>> for bad access someone else has made. So essentially it's new ABI.
>>>>>
>>>>> SIGBUS. For which the answer is yes, you can and do get SIGBUS for
>>>>> actions taken by other processes.
>>>>
>>>> Oh right I always forget that SIGBUS aliases with SIGIO. Anyway if
>>>> userspace wants SIGIO we just need to provide it with a pollable fd and
>>>> then it can use fcntl to make that happen. That's imo a much better api
>>>> than unconditionally throwing around signals. Also we already have the
>>>> reset stats ioctl to tell userspace that its gpu context is toats. If
>>>> anyone wants that to be pollable (or even send SIGIO) I think we should
>>>> extend that, with all the usual "needs userspace&igt" stuff on top.
>>>
>>> I don't see that this notification can be optional. Process is confused
>>> about its memory map use so should die. :)
>>>
>>> This is not a GPU error/hang - this is the process doing stupid things.
>>>
>>> MMU notifiers do not support decision making otherwise we could say
>>> -ETXTBUSY or something on munmap, but we can't. Not even sure that it would
>>> help in all cases, would have to fail clone as well and who knows what.
>>
>> So what happens if the gpu just keeps using the memory? It'll all be
>> horribly undefined behaviour and eventually it'll die on an -EFAULT in
>> execbuf, but does anything else bad happen?
>
> We don't see an EFAULT unless a miracle occurs, and the stale pages
> continue to be read/written by other processes (as well as the client).
> Horribly undefined behaviour with a misinformation leak.

What other processes? Pages will still be referenced so won't be reused 
so there is not information leak across unrelated processes. Unless you 
meant ones involved in object sharing?

But we could improve the revoke mechanism I suppose by marking the 
object and then revoking it at the next opportunity?

Regards,

Tvrtko