Raciness with page table shadows being swapped out
Nicolai Hähnle
nhaehnle at gmail.com
Wed Dec 14 14:22:18 UTC 2016
On 13.12.2016 10:48, Christian König wrote:
>>>> The attached patch has fixed these crashes for me so far, but it's
>>>> very heavy-handed: it collects all page table shadows and the page
>>>> directory shadow and adds them all to the reservations for the callers
>>>> of amdgpu_vm_update_page_directory.
>>>
>>> That is most likely just a timing change, cause the shadows should end
>>> up in the duplicates list anyway. So the patch shouldn't have any
>>> effect.
>>
>> Okay, so the reason for the remaining crash is still unclear at least
>> for me.
>
> Yeah, that's a really good question. Can you share the call stack of the
> problem once more?
Pretty sure I found the root cause now. amdgpu_vm_validate_pt_bos relies
on the eviction counter to be able to skip the validation of the page
tables.
However, moving the shadow page tables out from mem_type TT to SYSTEM
doesn't count as an eviction (it just unbinds the mapping in the GTT).
Clearly, that's a problem.
The quick fix is to skip the num_evictions check in
amdgpu_vm_validate_pt_bos. That has worked for me so far.
The next best thing is to add an unbind counter in addition to the
eviction counter that gets incremented whenever a BO is unbound (so it
counts a superset of what the eviction counter counts), and then check
that instead of the eviction counter.
Cheers,
Nicolai
More information about the amd-gfx
mailing list