[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

Chen Jie chenj at lemote.com
Fri Feb 17 02:42:47 PST 2012


>> 在 2012年2月15日 下午11:53,Jerome Glisse <j.glisse at gmail.com> 写道:
>>> To me it looks like the CP is trying to fetch memory but the
>>> GPU memory controller fail to fullfill cp request. Did you
>>> check the PCI configuration before & after (when things don't
>>> work) My best guest is PCI bus mastering is no properly working
>>> or the PCIE GPU gart table as wrong data.
>>>
>>> Maybe one need to drop bus master and reenable bus master to
>>> work around some bug...
>> Thanks for your suggestion. We've tried the 'drop and reenable master'
>> trick, unfortunately doesn't work.
>> The PCI configuration compare will be done later.
> Update: We've checked the first 64 bytes of PCI configuration space
> before & after, and didn't find any difference.
Hi,

Status update:
We try to analyze the GPU instruction stream when lockup today. The
lockup always occurs after tasks restarting, so the related
instructions should reside at ib, as pointed by dmesg:
[ 2456.585937] GPU lockup (waiting for 0x0002F98B last fence id 0x0002F98A)

Print instructions in related ib:
[ 2462.492187] PM4 block 10 has 115 instructions, with fence seq 2f98b
....
[ 2462.976562] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2462.984375] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2462.988281] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2462.992187] Type3:PACKET3_SET_ALU_CONST ref_addr  <not interpreted>
[ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880
[ 2463.003906] Type3:PACKET3_SET_RESOURCE ref_addr  <not interpreted>
[ 2463.007812] Type3:PACKET3_SET_CONFIG_REG ref_addr  <not interpreted>
[ 2463.011718] Type3:PACKET3_INDEX_TYPE ref_addr  <not interpreted>
[ 2463.015625] Type3:PACKET3_NUM_INSTANCES ref_addr  <not interpreted>
[ 2463.019531] Type3:PACKET3_DRAW_INDEX_AUTO ref_addr  <not interpreted>
[ 2463.027343] Type3:PACKET3_EVENT_WRITE ref_addr  <not interpreted>
[ 2463.031250] Type3:PACKET3_SET_CONFIG_REG ref_addr  <not interpreted>
[ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680
[ 2463.039062] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2463.046875] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2463.050781] Type3:PACKET3_SET_CONTEXT_REG ref_addr  <not interpreted>
[ 2463.054687] Type3:PACKET3_SET_BOOL_CONST ref_addr  <not interpreted>
[ 2463.062500] Type3:PACKET3_SURFACE_SYNC ref_addr 10668e

CP_COHER_BASE was 0x0018C880, so the instruction which caused lockup
should be in:
[ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880
...
[ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680

Here, only SURFACE_SYNC, SET_RESOURCE and EVENT_WRITE will access GPU memory.
We guess it maybe SURFACE_SYNC?

BTW, when lockup happens, if places the CP ring at vram, ring_test
will pass, but ib_test fails -- which suggests ME fails to feed CP
when lockup? May a former SURFACE_SYNC block the MC?

P.S. We hack to place CP ring, ib and ih at vram and disable
wb(radeon_no_wb=1) in today's debugging.

Any idea?



Regards,
-- Chen Jie


More information about the dri-devel mailing list