[Intel-gfx] [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range

Wed Sep 9 07:45:40 PDT 2015

On 08/10/2015 09:51 AM, Chris Wilson wrote:
> Whilst discussing possible ways to trigger an invalidate_range on a
> userptr with an aliased GGTT mmapping (and so cause a struct_mutex
> deadlock), the conclusion is that we can, and we must, prevent any
> possible deadlock by avoiding taking the mutex at all during
> invalidate_range. This has numerous advantages all of which stem from
> avoid the sleeping function from inside the unknown context. In
> particular, it simplifies the invalidate_range because we no longer
> have to juggle the spinlock/mutex and can just hold the spinlock
> for the entire walk. To compensate, we have to make get_pages a bit more
> complicated in order to serialise with a pending cancel_userptr worker.
> As we hold the struct_mutex, we have no choice but to return EAGAIN and
> hope that the worker is then flushed before we retry after reacquiring
> the struct_mutex.
>
> The important caveat is that the invalidate_range itself is no longer
> synchronous. There exists a small but definite period in time in which
> the old PTE's page remain accessible via the GPU. Note however that the
> physical pages themselves are not invalidated by the mmu_notifier, just
> the CPU view of the address space. The impact should be limited to a
> delay in pages being flushed, rather than a possibility of writing to
> the wrong pages. The only race condition that this worsens is remapping
> an userptr active on the GPU where fresh work may still reference the
> old pages due to struct_mutex contention. Given that userspace is racing
> with the GPU, it is fair to say that the results are undefined.
>
> v2: Only queue (and importantly only take one refcnt) the worker once.

This one I looked at at the time of previous posting and it looked fine, 
minus one wrong line of thinking of mine. On a brief look it still looks 
good, so:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

I assume Michał has run all these through the relevant test cases?

Slightly related, I now worry about the WARN_ONs in 
__cancel_userptr__worker since they look to be triggerable by malicious 
userspace which is not good.

Also my proposed error handling for the previous patch is slightly wrong 
because I misremebered what mo->active stands for.

Regards,

Tvrtko