CIK hangs with kernel 3.15, bisected

Marek Olšák maraeo at gmail.com
Tue May 13 13:27:30 PDT 2014


I applied these two patches Christian sent to dri-devel:

drm/radeon: fix page directory update size estimation
drm/radeon: fix buffer placement under memory pressure v2

on top of torvalds's master branch.

Marek

On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> On 13.05.2014 21:50, Marek Olšák wrote:
>>
>> Hi Christian,
>>
>> The performance regression I saw with piglit seems to be fixed with
>> latest kernel git. It's difficult to bisect the kernel, because there
>> are only merges between 3.14 and 3.15 and the merged committs are
>> actually based on 3.14-rc1 and 3.14-rc4.
>>
>> All seems to be fine with your fixes.
>>
>
> Which fixes have you applied? There are quite a few pending patches on
> dri-devel, that aren't yet part of drm-fixes-3.15.
>
> Grigori
>
>
>> Marek
>>
>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>>
>>> Is the performance regression regression caused by the page table changes
>>> or
>>> something else?
>>>
>>> I did made some tests with xonotic while developing it and it didn't
>>> showed
>>> anything obvious, but I didn't made tests on different systems.
>>>
>>> Christian.
>>>
>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>
>>>> Your latest patches fix the regression.
>>>>
>>>> The performance regression can also be reproduced with piglit "-t
>>>> texelFetch.fs".
>>>>
>>>> Kernel 3.14:
>>>>      real    0m17.724s
>>>>      user    0m41.905s
>>>>      sys    0m11.299s
>>>>
>>>> The problematic commit checked out + your fixes (without the PTE patch I
>>>> think):
>>>>      real    0m23.474s
>>>>      user    1m1.008s
>>>>      sys    0m13.812s
>>>>
>>>> Marek
>>>>
>>>>
>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>>
>>>>>
>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>
>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg at chown.ath.cx>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>
>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>> being
>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>>>> what's
>>>>>>> going on.
>>>>>>
>>>>>>
>>>>>> Allocation overhead?
>>>>>
>>>>>
>>>>>
>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>> then
>>>>> gets extended to a certain rate until they no longer need more address
>>>>> space
>>>>> and are done with it.
>>>>>
>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>>
>>>>>>> Grigori
>>>>>>>
>>>>>>>
>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>> the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> Please test the attached patch.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>
>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>> added
>>>>>>>>>> for userspace buffers.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>>>> (as
>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>
>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>> matches
>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>> need
>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>> works
>>>>>>>>> as expected).
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Christian,
>>>>>>>>>>
>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>
>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>> added
>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>
>>>>>>>>>> Marek
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>> just
>>>>>>>>>>> a
>>>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>>>> might
>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>
>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>
>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>
>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>> if
>>>>>>>>>>>>> I boot
>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>> high
>>>>>>>>>>>>> settings.
>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>> similar
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>> regression
>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>
>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>>> also
>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>> probably
>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>>>> part of a
>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>> into
>>>>>>>>>>>> it today,
>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>>>> these
>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>> if
>>>>>>>>>>>>> I boot
>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>> high
>>>>>>>>>>>>> settings.
>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>> similar
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>>
>>>
>


More information about the dri-devel mailing list