[Intel-gfx] [RFC] How to assign blame when multiple rings are hung
Mika Kuoppala
mika.kuoppala at linux.intel.com
Tue Jan 28 12:16:34 CET 2014
Hi,
I am working with a patchset [1] which, originally, aimed to fix
how we find out the guilty batches with ppgtt.
But during the review it became clear that I don't have a clear
idea how the behaviour should be when multiple rings encounter
a problematic batch at the same time.
The following i-g-t patch will add test which asserts that
both contexts get blame of having (problematic) batch active
during hang.
The patch set [1] will fail with this test case as it will
blame only the first context that injected the hang.
We would need to change the test to for it to pass:
- assert_reset_status(fd[1], 0, RS_BATCH_ACTIVE);
+ assert_reset_status(fd[1], 0, RS_BATCH_PENDING);
I lean towards that both contexts get their batch_active count
increased. As other rings might gain contexts and we could
already reset individual rings instead of whole GPU.
But we need to take a pick so thats why the RFC.
Thoughts?
--
[1]: https://github.com/mkuoppal/linux/commits/one_guilty
Mika Kuoppala (1):
tests/gem_reset_stats: add subtest hang-render-and-<ring>
tests/gem_reset_stats.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
--
1.7.9.5
More information about the Intel-gfx
mailing list