[igt-dev] [RFC] IGT GPU watchdog

Carlos Santa carlos.santa at intel.com
Mon Apr 15 18:22:50 UTC 2019


Sharing this at this point as RFC to help expand the coverage
on this topic and help me debug some of the issues I am seeing.

The latest patch series in the kernel: https://patchwork.kernel.org/patch/10866659/  

Test Coverage:

1. gem context created with a long batch run until completion
2. gem context created with a long batch but canceled after some time
using gpu watchdog timeout
3. 2 gem contexts created, ctx2 executed and ctx1 canceled after some
time using gpu watchdog timeout
4. the inverse of #3 above, ctx2 canceled after some time using gpu
watchdog time and ctx1 run until completion.

Preemption handling

1. Submit a long batch and after half of the executed run time
submit a higher priority batch with half the duration. Very the
latter was executed.
2. Submit a low priority long batch without gpu watchdog then
a higher priority with gpu watchdog and verify whether the
higher priority batch was canceled before the low priority
one completed.

Known issues:

1. The fence status EIO is not getting propagated in the kernel layer
after an engine reset using gpu watchdog, after each reset the fence 
still returns -1.

2. The creation of a gem context with a low or high priority value
doesn't seem to work correctly, need help on this to test preemption,
see the code below as reference.

3. TODO: the subtest "gpu-watchdog-long-batch-2-contexts" uses a dummy
sleep(6) for now but this needs to be changed. The contexts can't be
destroyed either until both threads are done executing, so commented out
for now.

Carlos Santa (1):
  tests/gem_watchdog: Initial set of tests for GPU watchdog

 tests/Makefile.sources    |   3 +
 tests/i915/gem_watchdog.c | 439 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/meson.build         |   1 +
 3 files changed, 443 insertions(+)
 create mode 100644 tests/i915/gem_watchdog.c

-- 
2.7.4



More information about the igt-dev mailing list