[Intel-gfx] GPU hang with high media workload on BSW

Fri Jul 1 05:14:28 UTC 2016

Hi Guys,

Thanks for the help in advanced!

I'm encountering a GPU hang issue while running multiple channel H264 video decoding + VPP composition, display and also one channel H264 encoding on BSW.
It's a render ring stuck like below:
[58503.223700] [drm] stuck on render ring [58503.246340] [drm] GPU HANG: ecode 8:0:0x7f1d7e3d, in Challenge [3259], reason: Ring hung, action: reset

There is a part of the /sys/class/drm/card0/error as below, I suspect the hang is caused by the incorrect render ring buffer content:
In below line with 'where I suspect', the value of ring buffer is 18800001 (MI_BATCH_BUFFER_START_GEN8), but the next DWORD is 00100002. 
Since MI_BATCH_BUFFER_START_GEN8 should be followed by batch buffer address, I think the content of ring buffer is not correct.

==========part of the /sys/class/drm/card0/error=========
render ring --- 3 requests
  seqno 0x020dc83a, emitted 4353167966, tail 0x00000070
  seqno 0x020dc83b, emitted 4353167969, tail 0x000000f0
  seqno 0x020dc83e, emitted 4353167982, tail 0x00000170
render ring --- ringbuffer = 0x00015000
00000000 :  18800001 // where I suspect
00000004 :  00100002 // where I suspect
00000008 :  00000000
0000000c :  00000000
00000010 :  00000000
00000014 :  00000000
00000018 :  7a000004
0000001c :  01144c1c
00000020 :  00036080
00000024 :  00000000
00000028 :  00000000
0000002c :  00000000
00000030 :  04000000
00000034 :  00000000
00000038 :  0c000000
0000003c :  1382c10c
==========part of the /sys/class/drm/card0/error=========

To identify when the ring buffer is incorrectly programmed, I added some code to read the first DWORD of ring buffer back after intel_ring_emit in gen8_emit_pipe_control while tail of ring buffer is zero.
The result is: the read-back first DWORD of ring buffer is sometimes different from the data intel_ring_emit just writes when tail is 0. And just after this, GPU hang may happen.

Here is the output of my print:
[ 3409.067402] rcs b:0x18800001 d:0x7a000004 t:0

'b' - ioread32 (ringbuf->virtual_start)
'd' - intel_ring_emit wants to write
't' - the value of tail

I'm aware that ringbuf->virtual_start is write combine,  the read may led to write-combine buffer flush and slow read performance. 
But don't know why it's different from the value intel_ring_emit just writes? 

Also have another question, after CPU write to the WC ring buffer, how is WC buffer flushed before GPU start to read ring buffer? 

Thanks a lot!
-James