Raciness with page table shadows being swapped out

Christian König deathsimple at vodafone.de
Wed Dec 14 14:56:56 UTC 2016


Am 14.12.2016 um 15:22 schrieb Nicolai Hähnle:
> On 13.12.2016 10:48, Christian König wrote:
>>>>> The attached patch has fixed these crashes for me so far, but it's
>>>>> very heavy-handed: it collects all page table shadows and the page
>>>>> directory shadow and adds them all to the reservations for the 
>>>>> callers
>>>>> of amdgpu_vm_update_page_directory.
>>>>
>>>> That is most likely just a timing change, cause the shadows should end
>>>> up in the duplicates list anyway. So the patch shouldn't have any
>>>> effect.
>>>
>>> Okay, so the reason for the remaining crash is still unclear at least
>>> for me.
>>
>> Yeah, that's a really good question. Can you share the call stack of the
>> problem once more?
>
> Pretty sure I found the root cause now. amdgpu_vm_validate_pt_bos 
> relies on the eviction counter to be able to skip the validation of 
> the page tables.
>
> However, moving the shadow page tables out from mem_type TT to SYSTEM 
> doesn't count as an eviction (it just unbinds the mapping in the GTT).
>
> Clearly, that's a problem.

Nice catch!

> The quick fix is to skip the num_evictions check in 
> amdgpu_vm_validate_pt_bos. That has worked for me so far.
>
> The next best thing is to add an unbind counter in addition to the 
> eviction counter that gets incremented whenever a BO is unbound (so it 
> counts a superset of what the eviction counter counts), and then check 
> that instead of the eviction counter.

Well to complicated, we should just make the eviction counter handle 
both events.

That's also the original meaning of it, e.g. unbinding pages from the 
GART is some sort of eviction as well in this case.

Regards,
Christian.

>
> Cheers,
> Nicolai
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




More information about the amd-gfx mailing list