[Intel-gfx] batchbuffers failing to execute

Eric Anholt eric at anholt.net
Mon Feb 20 21:10:49 CET 2012


I think I've got a concrete test here for something I've been trying
to track down for a while now: it appears that when the GPU is too
busy, batches get dropped and sometimes the GPU hangs.

This really became clear when I was testing a patch for citybench to
"improve" the swap throttling -- a broken version that I submitted
caused no throttling to occur at all.  The app, which renders a
predefined set of frames as fast as possible, would now frequently
stutter in the middle.  The results were insanely fast (40-60%
better!).  But it didn't make sense for there to be a stutter if the
set of frames rendered is predefined.

So, I wrote this little patch series to try to test my guess that my
batches were getting dropped.  The plan is: Emit a dword write every
(render) batch, and after a while go look and see if they all landed.
Corresponding to the stutters is:

Mesa: Initializing x86-64 optimizations
batch 42 didn't report: 0xd0d0d0d0 instead of 0x0000002a
batch 43 didn't report: 0xd0d0d0d0 instead of 0x0000002b
batch 44 didn't report: 0xd0d0d0d0 instead of 0x0000002c
batch 45 didn't report: 0xd0d0d0d0 instead of 0x0000002d
batch 46 didn't report: 0xd0d0d0d0 instead of 0x0000002e
batch 47 didn't report: 0xd0d0d0d0 instead of 0x0000002f
batch 48 didn't report: 0xd0d0d0d0 instead of 0x00000030
batch 49 didn't report: 0xd0d0d0d0 instead of 0x00000031
batch 50 didn't report: 0xd0d0d0d0 instead of 0x00000032
batch 51 didn't report: 0xd0d0d0d0 instead of 0x00000033
batch 52 didn't report: 0xd0d0d0d0 instead of 0x00000034
batch 53 didn't report: 0xd0d0d0d0 instead of 0x00000035
shutting up

The cool thing is apparently I don't need citybench, either.  This
paste came from vblank_mode=0 glxgears fullscreened on my 1600x900
panel (default size doesn't reproduce the failure).

This is gen7, danvet's kernel de67cba65944f26c0f147035bd62e30c5f456b96
rebased on top of cherry-picks of the 4 ivb workaround patches.  A
revert of a71d8d94525e8fd855c0466fb586ae1cb008f3a2 doesn't help.

I've also run this on a test libdrm that returns NULL pointers on map
failure and asserts that exec and pwrite don't return errors.




More information about the Intel-gfx mailing list