[Intel-gfx] Possible use_mm() mis-uses

Oded Gabbay oded.gabbay at gmail.com
Wed Aug 22 20:01:24 UTC 2018


On Wed, Aug 22, 2018 at 10:58 PM Linus Torvalds
<torvalds at linux-foundation.org> wrote:
>
> On Wed, Aug 22, 2018 at 12:37 PM Oded Gabbay <oded.gabbay at gmail.com> wrote:
> >
> > Having said that, I think we *are* protected by the mmu_notifier
> > release because if the process suddenly dies, we will gracefully clean
> > the process's data in our driver and on the H/W before returning to
> > the mm core code. And before we return to the mm core code, we set the
> > mm pointer to NULL. And the graceful cleaning should be serialized
> > with the load_hqd uses.
>
> So I'm a bit nervous about the mmu_notifier model (and the largely
> equivalent exit_aio() model for the USB gardget AIO uses).
>
> The reason I'm nervous about it is that the mmu_notifier() gets called
> only after the mm_users count has already been decremented to zero
> (and the exact same thing goes for exit_aio()).
>
> Now that's fine if you actually get rid of all accesses in
> mmu_notifier_release() or in exit_aio(), because the page tables still
> exist at that point - they are in the process of being torn down, but
> they haven't been torn down yet.
>
> But for something like a kernel thread doing use_mm(), the thing that
> worries me is a pattern something like this:
>
>   kwork thread          exit thread
>   --------              --------
>
>                         mmput() ->
>                           mm_users goes to zero
>
>   use_mm(mmptr);
>   ..
>
>                           mmu_notifier_release();
>                           exit_mm() ->
>                             exit_aio()
>
> and the pattern is basically the same regatdless of whether you use
> mmu_notifier_release() or depend on some exit_aio() flushing your aio
> work: the use_mm() can be called with a mm that has already had its
> mm_users count decremented to zero, and that is now scheduled to be
> free'd.
>
> Does it "work"? Yes. Kind of. At least if the mmu notifier and/or
> exit_aio() actually makes sure to wait for any kwork thread thing. But
> it's a bit of a worrisome pattern.
>
>            Linus

Yes, agreed, and that's why we will be on the safe side and eliminate
this pattern from our code and make sure we won't add this pattern in
the future.

Oded


More information about the Intel-gfx mailing list