Raciness with page table shadows being swapped out
Nicolai Hähnle
nhaehnle at gmail.com
Mon Dec 12 15:20:05 UTC 2016
Hi all,
I just sent out two patches that hopefully make the kernel module more
robust in the face of page table shadows being swapped out.
However, even with those patches, I can still fairly reliably reproduce
crashes with a backtrace of the shape
amdgpu_cs_ioctl
-> amdgpu_vm_update_page_directory
-> amdgpu_ttm_bind
-> amdgpu_gtt_mgr_alloc
The plausible reason for these crashes is that nothing seems to prevent
the shadow BOs from being moved between the calls to amdgpu_cs_validate
in amdgpu_cs_parser_bos and the calls to amdgpu_ttm_bind.
The attached patch has fixed these crashes for me so far, but it's very
heavy-handed: it collects all page table shadows and the page directory
shadow and adds them all to the reservations for the callers of
amdgpu_vm_update_page_directory.
I feel like there should be a better way. In part, I wonder why the
shadows are needed in the first place. I vaguely recall the discussions
about GPU reset and such, but I don't remember why the page tables can't
just be rebuilt in some other way.
Cheers,
Nicolai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amd-amdgpu-reserve-shadows-of-page-directory-and.patch
Type: text/x-patch
Size: 9341 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/c830099c/attachment.bin>
More information about the amd-gfx
mailing list