Raciness with page table shadows being swapped out

Nicolai Hähnle nhaehnle at gmail.com
Wed Dec 14 14:22:18 UTC 2016


On 13.12.2016 10:48, Christian König wrote:
>>>> The attached patch has fixed these crashes for me so far, but it's
>>>> very heavy-handed: it collects all page table shadows and the page
>>>> directory shadow and adds them all to the reservations for the callers
>>>> of amdgpu_vm_update_page_directory.
>>>
>>> That is most likely just a timing change, cause the shadows should end
>>> up in the duplicates list anyway. So the patch shouldn't have any
>>> effect.
>>
>> Okay, so the reason for the remaining crash is still unclear at least
>> for me.
>
> Yeah, that's a really good question. Can you share the call stack of the
> problem once more?

Pretty sure I found the root cause now. amdgpu_vm_validate_pt_bos relies 
on the eviction counter to be able to skip the validation of the page 
tables.

However, moving the shadow page tables out from mem_type TT to SYSTEM 
doesn't count as an eviction (it just unbinds the mapping in the GTT).

Clearly, that's a problem.

The quick fix is to skip the num_evictions check in 
amdgpu_vm_validate_pt_bos. That has worked for me so far.

The next best thing is to add an unbind counter in addition to the 
eviction counter that gets incremented whenever a BO is unbound (so it 
counts a superset of what the eviction counter counts), and then check 
that instead of the eviction counter.

Cheers,
Nicolai


More information about the amd-gfx mailing list