CIK hangs with kernel 3.15, bisected

Marek Olšák maraeo at gmail.com
Fri May 30 04:30:45 PDT 2014


Grigori,

you can git-checkout the commit before and after the memory management
changes, compile both and test them.

Marek

On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> On 13.05.2014 22:27, Marek Olšák wrote:
>>
>> I applied these two patches Christian sent to dri-devel:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> on top of torvalds's master branch.
>>
>
> With latest kernel master (a991639c) I still see a regression, compared to
> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
> record accurate numbers.
>
> Maybe the improved memory management has some overhead, but this is not
> acceptable IMHO. I'll try to investigate further.
>
> Best regards
>
> Grigori
>
>> Marek
>>
>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg at chown.ath.cx>
>> wrote:
>>>
>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>> The performance regression I saw with piglit seems to be fixed with
>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>
>>>> All seems to be fine with your fixes.
>>>>
>>>
>>> Which fixes have you applied? There are quite a few pending patches on
>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>
>>> Grigori
>>>
>>>
>>>> Marek
>>>>
>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>>
>>>>>
>>>>> Is the performance regression regression caused by the page table
>>>>> changes
>>>>> or
>>>>> something else?
>>>>>
>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>> showed
>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>
>>>>>> Your latest patches fix the regression.
>>>>>>
>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>> texelFetch.fs".
>>>>>>
>>>>>> Kernel 3.14:
>>>>>>       real    0m17.724s
>>>>>>       user    0m41.905s
>>>>>>       sys    0m11.299s
>>>>>>
>>>>>> The problematic commit checked out + your fixes (without the PTE patch
>>>>>> I
>>>>>> think):
>>>>>>       real    0m23.474s
>>>>>>       user    1m1.008s
>>>>>>       sys    0m13.812s
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>>
>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>
>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg at chown.ath.cx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>
>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>> being
>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>> wonder
>>>>>>>>> what's
>>>>>>>>> going on.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Allocation overhead?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>>>> then
>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>> address
>>>>>>> space
>>>>>>> and are done with it.
>>>>>>>
>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Grigori
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>
>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>> added
>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>> placement
>>>>>>>>>>> (as
>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>
>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>> matches
>>>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>>>> need
>>>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>>>> works
>>>>>>>>>>> as expected).
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>
>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>
>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>> added
>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>
>>>>>>>>>>>> Marek
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>>>> just
>>>>>>>>>>>>> a
>>>>>>>>>>>>> complete shoot into the dark found by rereading the code, but
>>>>>>>>>>>>> it
>>>>>>>>>>>>> might
>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>>>> into
>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the dri-devel mailing list