CIK hangs with kernel 3.15, bisected

Alex Deucher alexdeucher at gmail.com
Thu May 29 09:52:22 PDT 2014


On Thu, May 29, 2014 at 12:30 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Hi Marek & Alex,
>
> I've found the issue why forcefully evicting page tables sometimes crashes
> the box.
>
> Well this is a typical hexdump page table before it is moved to GART:
> 000117f000  02914061 00000000
> 000117f008  02915061 00000000
> 000117f010  02916061 00000000
> 000117f018  02917061 00000000
> 000117f020  02918061 00000000
>
> And it looks like this when it comes back:
> 0006102000  00000000 00000000
> *
>
> Ideas? I don't really have an explanation for this. Moving buffers around
> otherwise seems to work perfectly fine.

Nothing I can think of off hand.  Might be worth trying CP DMA rather
than SDMA for BO moves to see if we can narrow it down a bit more.
Might also try the other SDMA ring.

Alex

>
> Thanks,
> Christian.
>
> Am 28.05.2014 12:38, schrieb Christian König:
>
>> I already tried a similar patch as well, without any more noticeable
>> crashes. But going to give this another round with your patch and openarena.
>>
>> Thanks,
>> Christian.
>>
>> Am 27.05.2014 23:55, schrieb Marek Olšák:
>>>
>>> Hi Christian,
>>>
>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>> fixed yet. They are very rare and very random. Therefore, I have come
>>> up with a patch which evicts page tables between IBs. See the
>>> attachment. With that patch applied, the system starts fine, compiz
>>> and glxgears work, but once I start playing openarena, it locks up
>>> pretty quickly.
>>>
>>> The patch shouldn't do anything in theory, because pages are moved
>>> back to VRAM immediately after that. However, the VRAM address of page
>>> tables may end up being different from before, which might be the root
>>> cause.
>>>
>>> Marek
>>>
>>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>>
>>>> Crap, any chance you can narrow it down a bit more?
>>>>
>>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>>> perfectly fine.
>>>>
>>>> What hw do you test on?
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Even though some regressions are fixed by these patches:
>>>>>
>>>>> drm/radeon: fix page directory update size estimation
>>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>>
>>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>>> causes it and it can only be reproduced by running whole piglit with
>>>>> concurrency enabled.
>>>>>
>>>>> My kernel git log:
>>>>>
>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>>> (10 hours ago) <Christian König>
>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>>> hours ago) <Christian König>
>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>>> months ago) <Christian König>
>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>>> months ago) <Christian König>
>>>>>
>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>>> of the two fixes is the first bad commit.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>>
>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>>
>>>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>>>
>>>>>>       No need to make it more complicated than necessary,
>>>>>>       just allocate the page tables as normal BO and
>>>>>>       flush whenever the address change.
>>>>>>
>>>>>>       v2: update comments and function name
>>>>>>       v3: squash bug fixes, page directory and tables patch
>>>>>>       v4: rebased on Mareks changes
>>>>>>
>>>>>>       Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>
>>>>>>
>>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>>
>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>> parameters:
>>>>>> -t texelFetch.fs
>>>>>>
>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>> causes buffer evictions.
>>>>>>
>>>>>> Any idea what is wrong with it?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Marek
>>>>
>>>>
>>
>


More information about the dri-devel mailing list