Graceful page fault handling for Vega/Navi
ckoenig.leichtzumerken at gmail.com
Wed Sep 4 15:02:21 UTC 2019
this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
This needs the following prerequisites:
a) The firmware must be new enough so allow re-routing of page faults.
b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
c) Enough free VRAM to allocate page tables to point to the dummy page.
The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.
Please review and/or comment,
More information about the amd-gfx