Raciness with page table shadows being swapped out

Christian König deathsimple at vodafone.de
Mon Dec 12 18:22:48 UTC 2016


Am 12.12.2016 um 16:20 schrieb Nicolai Hähnle:
> Hi all,
>
> I just sent out two patches that hopefully make the kernel module more 
> robust in the face of page table shadows being swapped out.
>
> However, even with those patches, I can still fairly reliably 
> reproduce crashes with a backtrace of the shape
>
> amdgpu_cs_ioctl
>  -> amdgpu_vm_update_page_directory
>  -> amdgpu_ttm_bind
>  -> amdgpu_gtt_mgr_alloc
>
> The plausible reason for these crashes is that nothing seems to 
> prevent the shadow BOs from being moved between the calls to 
> amdgpu_cs_validate in amdgpu_cs_parser_bos and the calls to 
> amdgpu_ttm_bind.

The shadow BOs use the same reservation object than the real BOs. So as 
long as the real BOs can't be evicted the shadows can't be evicted either.

>
> The attached patch has fixed these crashes for me so far, but it's 
> very heavy-handed: it collects all page table shadows and the page 
> directory shadow and adds them all to the reservations for the callers 
> of amdgpu_vm_update_page_directory.

That is most likely just a timing change, cause the shadows should end 
up in the duplicates list anyway. So the patch shouldn't have any effect.

>
> I feel like there should be a better way. In part, I wonder why the 
> shadows are needed in the first place. I vaguely recall the 
> discussions about GPU reset and such, but I don't remember why the 
> page tables can't just be rebuilt in some other way.

It's just the simplest and fastest way to keep a copy of the page tables 
around.

The problem with rebuilding the page tables from the mappings is that 
the housekeeping structures already have the future state when a reset 
happens, not the state we need to rebuild the tables.

We could obviously change the housekeeping a bit to keep both states, 
but that would complicate mapping and unmapping of BOs significantly.

Regards,
Christian.

>
> Cheers,
> Nicolai
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/3ad9d3b1/attachment-0001.html>


More information about the amd-gfx mailing list