[Intel-gfx] [PATCH] dma-buf: Precheck for a valid dma_fence before acquiring the reference
Daniel Vetter
daniel.vetter at ffwll.ch
Fri Feb 21 16:34:35 UTC 2020
On Fri, Feb 21, 2020 at 4:26 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> Quoting Chris Wilson (2020-02-21 15:23:38)
> > Quoting Daniel Vetter (2020-02-21 15:17:24)
> > > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
> > > > under racey conditions -- a perfect recipe for a disaster. As we know
> > > > the caller may be handling stale memory, use kasan to confirm the
> > > > dma-fence, or rather its memory block, is valid before attempting to
> > > > acquire a reference. This should help us to more quickly and clearly
> > > > identify lost races.
> > >
> > > Hm ... I'm a bit lost on the purpose, and what this does. Fences need
> > > to be rcu-freed, and I have honestly no idea how kasan treats those.
> > > Are we throwing false positives, because kasan thinks the stuff is
> > > freed, but we're still accessing it (while the grace period hasn't
> > > passed, so anything freed is still guaranteed to be at least in the
> > > slab cache somewhere).
> > >
> > > I'm not seeing how this catches lost races quicker, since the refcount
> > > should get to 0 way before we get to the kfree. So the refcount check
> > > on the next line should catch strictly more races than the kasan
> > > check.
> >
> > It's not about the fence itself, but those pointing to the fence. That's
> > where we may find garbage, and by returning NULL the kernel keeps
> > working for a bit longer as you try to piece together the race.
>
> Plus given all the inlining, the kasan warning for the refcount is not
> that clear about where it is called from, which is a bit of a nuisance,
> so an explicit warning was much easier for finding the culprit.
Hm ok, that makes a lot more sense. Can you pls amend the commit
message (paste the above if you feel uninspired, this discussion
explains pretty good what's the point). And maybe a link to bug report
or include an example splat and how it's improved.
With that stuff added Acked-by: Daniel Vetter <daniel.vetter at ffwll.ch>
But maybe get another co-conspirator/bug-hunter to ack this too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list