[Intel-gfx] [PATCH] drm/i915: Avoid dereferencing a dead context

Wed Apr 29 14:15:15 UTC 2020

Quoting Tvrtko Ursulin (2020-04-29 14:42:44)
> 
> On 28/04/2020 10:02, Chris Wilson wrote:
> > Once the intel_context is closed, the GEM context may be freed and so
> > the link from intel_context.gem_context is invalid.
> > 
> > <3>[  219.782944] BUG: KASAN: use-after-free in intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <3>[  219.782996] Read of size 8 at addr ffff8881d7dff0b8 by task kworker/0:1/12
> > 
> > <4>[  219.783052] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G     U            5.7.0-rc2-g1f3ffd7683d54-kasan_118+ #1
> > <4>[  219.783055] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 3402 04/26/2017
> > <4>[  219.783105] Workqueue: events heartbeat [i915]
> > <4>[  219.783109] Call Trace:
> > <4>[  219.783113]  <IRQ>
> > <4>[  219.783119]  dump_stack+0x96/0xdb
> > <4>[  219.783177]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <4>[  219.783182]  print_address_description.constprop.6+0x16/0x310
> > <4>[  219.783239]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <4>[  219.783295]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <4>[  219.783300]  __kasan_report+0x137/0x190
> > <4>[  219.783359]  ? intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <4>[  219.783366]  kasan_report+0x32/0x50
> > <4>[  219.783426]  intel_engine_coredump_alloc+0x1bc3/0x2250 [i915]
> > <4>[  219.783481]  execlists_reset+0x39c/0x13d0 [i915]
> > <4>[  219.783494]  ? mark_held_locks+0x9e/0xe0
> > <4>[  219.783546]  ? execlists_hold+0xfc0/0xfc0 [i915]
> > <4>[  219.783551]  ? lockdep_hardirqs_on+0x348/0x5f0
> > <4>[  219.783557]  ? _raw_spin_unlock_irqrestore+0x34/0x60
> > <4>[  219.783606]  ? execlists_submission_tasklet+0x118/0x3a0 [i915]
> > <4>[  219.783615]  tasklet_action_common.isra.14+0x13b/0x410
> > <4>[  219.783623]  ? __do_softirq+0x1e4/0x9a7
> > <4>[  219.783630]  __do_softirq+0x226/0x9a7
> > <4>[  219.783643]  do_softirq_own_stack+0x2a/0x40
> > <4>[  219.783647]  </IRQ>
> > <4>[  219.783692]  ? heartbeat+0x3e2/0x10f0 [i915]
> > <4>[  219.783696]  do_softirq.part.13+0x49/0x50
> > <4>[  219.783700]  __local_bh_enable_ip+0x1a2/0x1e0
> > <4>[  219.783748]  heartbeat+0x409/0x10f0 [i915]
> > <4>[  219.783801]  ? __live_idle_pulse+0x9f0/0x9f0 [i915]
> > <4>[  219.783806]  ? lock_acquire+0x1ac/0x8a0
> > <4>[  219.783811]  ? process_one_work+0x811/0x1870
> > <4>[  219.783827]  ? rcu_read_lock_sched_held+0x9c/0xd0
> > <4>[  219.783832]  ? rcu_read_lock_bh_held+0xb0/0xb0
> > <4>[  219.783836]  ? _raw_spin_unlock_irq+0x1f/0x40
> > <4>[  219.783845]  process_one_work+0x8ca/0x1870
> > <4>[  219.783848]  ? lock_acquire+0x1ac/0x8a0
> > <4>[  219.783852]  ? worker_thread+0x1d0/0xb80
> > <4>[  219.783864]  ? pwq_dec_nr_in_flight+0x2c0/0x2c0
> > <4>[  219.783870]  ? do_raw_spin_lock+0x129/0x290
> > <4>[  219.783886]  worker_thread+0x82/0xb80
> > <4>[  219.783895]  ? __kthread_parkme+0xaf/0x1b0
> > <4>[  219.783902]  ? process_one_work+0x1870/0x1870
> > <4>[  219.783906]  kthread+0x34e/0x420
> > <4>[  219.783911]  ? kthread_create_on_node+0xc0/0xc0
> > <4>[  219.783918]  ret_from_fork+0x3a/0x50
> > 
> > <3>[  219.783950] Allocated by task 1264:
> > <4>[  219.783975]  save_stack+0x19/0x40
> > <4>[  219.783978]  __kasan_kmalloc.constprop.3+0xa0/0xd0
> > <4>[  219.784029]  i915_gem_create_context+0xa2/0xab8 [i915]
> > <4>[  219.784081]  i915_gem_context_create_ioctl+0x1fa/0x450 [i915]
> > <4>[  219.784085]  drm_ioctl_kernel+0x1d8/0x270
> > <4>[  219.784088]  drm_ioctl+0x676/0x930
> > <4>[  219.784092]  ksys_ioctl+0xb7/0xe0
> > <4>[  219.784096]  __x64_sys_ioctl+0x6a/0xb0
> > <4>[  219.784100]  do_syscall_64+0x94/0x530
> > <4>[  219.784103]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > 
> > <3>[  219.784120] Freed by task 12:
> > <4>[  219.784141]  save_stack+0x19/0x40
> > <4>[  219.784145]  __kasan_slab_free+0x130/0x180
> > <4>[  219.784148]  kmem_cache_free_bulk+0x1bd/0x500
> > <4>[  219.784152]  kfree_rcu_work+0x1d8/0x890
> > <4>[  219.784155]  process_one_work+0x8ca/0x1870
> > <4>[  219.784158]  worker_thread+0x82/0xb80
> > <4>[  219.784162]  kthread+0x34e/0x420
> > <4>[  219.784165]  ret_from_fork+0x3a/0x50
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gpu_error.c | 12 +++++++-----
> >   1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 4d54dba35302..a976cd67b3b3 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1207,8 +1207,6 @@ static void engine_record_registers(struct intel_engine_coredump *ee)
> >   static void record_request(const struct i915_request *request,
> >                          struct i915_request_coredump *erq)
> >   {
> > -     const struct i915_gem_context *ctx;
> > -
> >       erq->flags = request->fence.flags;
> >       erq->context = request->fence.context;
> >       erq->seqno = request->fence.seqno;
> > @@ -1218,9 +1216,13 @@ static void record_request(const struct i915_request *request,
> >   
> >       erq->pid = 0;
> >       rcu_read_lock();
> > -     ctx = rcu_dereference(request->context->gem_context);
> > -     if (ctx)
> > -             erq->pid = pid_nr(ctx->pid);
> > +     if (!intel_context_is_closed(request->context)) {
> > +             const struct i915_gem_context *ctx;
> > +
> > +             ctx = rcu_dereference(request->context->gem_context);
> > +             if (ctx)
> > +                     erq->pid = pid_nr(ctx->pid);
> > +     }
> >       rcu_read_unlock();
> >   }
> >   
> > 
> 
> In the client busyness series I move the GEM ctx put to free_engines_rcu 
> - at which point the closed check here is not needed any more. Should we 
> delay this put right now to simplify? Maybe not.. I'll remember to tweak 
> it in my series.

Yeah, it's not the right answer and I was hoping that you would have a
better plan :)

But it fixes a uaf we see atm, so a temporary bandaid.

> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Regards,
> 
> Tvrtko
> 
> P.S. Fixes: 2e46a2a0b0149f951b63be1b5df6514676fed213 ?

Looks to be the culprit, yes.
-Chris