CIK hangs with kernel 3.15, bisected

Christian König deathsimple at vodafone.de
Thu May 29 09:30:51 PDT 2014


Hi Marek & Alex,

I've found the issue why forcefully evicting page tables sometimes 
crashes the box.

Well this is a typical hexdump page table before it is moved to GART:
000117f000  02914061 00000000
000117f008  02915061 00000000
000117f010  02916061 00000000
000117f018  02917061 00000000
000117f020  02918061 00000000

And it looks like this when it comes back:
0006102000  00000000 00000000
*

Ideas? I don't really have an explanation for this. Moving buffers 
around otherwise seems to work perfectly fine.

Thanks,
Christian.

Am 28.05.2014 12:38, schrieb Christian König:
> I already tried a similar patch as well, without any more noticeable 
> crashes. But going to give this another round with your patch and 
> openarena.
>
> Thanks,
> Christian.
>
> Am 27.05.2014 23:55, schrieb Marek Olšák:
>> Hi Christian,
>>
>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>> fixed yet. They are very rare and very random. Therefore, I have come
>> up with a patch which evicts page tables between IBs. See the
>> attachment. With that patch applied, the system starts fine, compiz
>> and glxgears work, but once I start playing openarena, it locks up
>> pretty quickly.
>>
>> The patch shouldn't do anything in theory, because pages are moved
>> back to VRAM immediately after that. However, the VRAM address of page
>> tables may end up being different from before, which might be the root
>> cause.
>>
>> Marek
>>
>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>> Crap, any chance you can narrow it down a bit more?
>>>
>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>> perfectly fine.
>>>
>>> What hw do you test on?
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>
>>>> Hi Christian,
>>>>
>>>> Even though some regressions are fixed by these patches:
>>>>
>>>> drm/radeon: fix page directory update size estimation
>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>
>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>> causes it and it can only be reproduced by running whole piglit with
>>>> concurrency enabled.
>>>>
>>>> My kernel git log:
>>>>
>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>> (10 hours ago) <Christian König>
>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>> hours ago) <Christian König>
>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>> months ago) <Christian König>
>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>> months ago) <Christian König>
>>>>
>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>> of the two fixes is the first bad commit.
>>>>
>>>> Marek
>>>>
>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>> Hi Christian,
>>>>>
>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>
>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>
>>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>>
>>>>>       No need to make it more complicated than necessary,
>>>>>       just allocate the page tables as normal BO and
>>>>>       flush whenever the address change.
>>>>>
>>>>>       v2: update comments and function name
>>>>>       v3: squash bug fixes, page directory and tables patch
>>>>>       v4: rebased on Mareks changes
>>>>>
>>>>>       Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>
>>>>>
>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>
>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>> parameters:
>>>>> -t texelFetch.fs
>>>>>
>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>> causes buffer evictions.
>>>>>
>>>>> Any idea what is wrong with it?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Marek
>>>
>



More information about the dri-devel mailing list