Raciness with page table shadows being swapped out

Nicolai Hähnle nhaehnle at gmail.com
Mon Dec 12 15:20:05 UTC 2016


Hi all,

I just sent out two patches that hopefully make the kernel module more 
robust in the face of page table shadows being swapped out.

However, even with those patches, I can still fairly reliably reproduce 
crashes with a backtrace of the shape

amdgpu_cs_ioctl
  -> amdgpu_vm_update_page_directory
  -> amdgpu_ttm_bind
  -> amdgpu_gtt_mgr_alloc

The plausible reason for these crashes is that nothing seems to prevent 
the shadow BOs from being moved between the calls to amdgpu_cs_validate 
in amdgpu_cs_parser_bos and the calls to amdgpu_ttm_bind.

The attached patch has fixed these crashes for me so far, but it's very 
heavy-handed: it collects all page table shadows and the page directory 
shadow and adds them all to the reservations for the callers of 
amdgpu_vm_update_page_directory.

I feel like there should be a better way. In part, I wonder why the 
shadows are needed in the first place. I vaguely recall the discussions 
about GPU reset and such, but I don't remember why the page tables can't 
just be rebuilt in some other way.

Cheers,
Nicolai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amd-amdgpu-reserve-shadows-of-page-directory-and.patch
Type: text/x-patch
Size: 9341 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/c830099c/attachment.bin>


More information about the amd-gfx mailing list