Graceful page fault handling for Vega/Navi

Christian König ckoenig.leichtzumerken at gmail.com
Wed Sep 4 15:02:21 UTC 2019


Hi everyone,

this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.

It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.

In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.

This needs the following prerequisites:
a) The firmware must be new enough so allow re-routing of page faults.
b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
c) Enough free VRAM to allocate page tables to point to the dummy page.

The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.

Please review and/or comment,
Christian.




More information about the amd-gfx mailing list