Question about page table updates at BO destroy
Nicolai Hähnle
nhaehnle at gmail.com
Wed Mar 22 15:06:47 UTC 2017
Hi all,
there's a bit of a puzzle where I'm wondering whether there's a subtle
bug in the amdgpu kernel module.
Basically, the concern is that a buggy user space driver might trigger a
sequence like this:
1. Submit a CS that accesses some BO _without_ adding that BO to the
buffer list.
2. Free that BO.
3. Some other task re-uses the memory underlying the BO.
4. The CS is submitted to the hardware and accesses memory that is now
already in use by somebody else, since there has been no update to the
page tables to reflect the freed BO.
Obviously there's a user space bug in step 1, but the kernel must still
prevent the conflicting memory accesses, and I don't see where it does.
amdgpu_gem_object_close takes a reservation of the BO and the page
directory, but then simply backs off that reservation rather than adding
a fence, which I suspect is necessary.
I believe that whenever we remove a BO from a VM, we must
unconditionally add the most recent page directory fence(?) to the BO.
Does that sound right?
Cheers,
Nicolai
More information about the amd-gfx
mailing list