CIK hangs with kernel 3.15, bisected
Grigori Goronzy
greg at chown.ath.cx
Fri May 30 11:01:13 PDT 2014
On 30.05.2014 13:46, Grigori Goronzy wrote:
> On 30.05.2014 13:30, Marek Olšák wrote:
>> Grigori,
>>
>> you can git-checkout the commit before and after the memory management
>> changes, compile both and test them.
>>
>
> I was trying to revert the changes, but it looks like too much changed
> in the meantime. The suitable commits to check out should be 0bc490a8
> (before) and 19dff56a (after), right?
>
Turns out these changes weren't the problem, but instead it's the page
tables rework which seems to also cause a bunch of other issues, commit
6d2f2944. The latest drm-fixes code doesn't change it, either.
According to my (not very scientific) testing with radeontop and the
"time" utility, this appears to be a CPU overhead problem. The "sys"
duration reported by time for a Xonotic benchmark run is over 3x as long
after the regression, and radeontop seems to report about 10% reduced
GPU load on average.
Best regards
Grigori
> Best regards
> Grigori
>
>> Marek
>>
>> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg at chown.ath.cx>
>> wrote:
>>> On 13.05.2014 22:27, Marek Olšák wrote:
>>>>
>>>> I applied these two patches Christian sent to dri-devel:
>>>>
>>>> drm/radeon: fix page directory update size estimation
>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>
>>>> on top of torvalds's master branch.
>>>>
>>>
>>> With latest kernel master (a991639c) I still see a regression,
>>> compared to
>>> 3.13 or 3.14, which have similar performance. Xonotic is about 7%
>>> slower.
>>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>>> record accurate numbers.
>>>
>>> Maybe the improved memory management has some overhead, but this is not
>>> acceptable IMHO. I'll try to investigate further.
>>>
>>> Best regards
>>>
>>> Grigori
>>>
>>>> Marek
>>>>
>>>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg at chown.ath.cx>
>>>> wrote:
>>>>>
>>>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>>>
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> The performance regression I saw with piglit seems to be fixed with
>>>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>>>
>>>>>> All seems to be fine with your fixes.
>>>>>>
>>>>>
>>>>> Which fixes have you applied? There are quite a few pending patches on
>>>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>>>
>>>>> Grigori
>>>>>
>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Is the performance regression regression caused by the page table
>>>>>>> changes
>>>>>>> or
>>>>>>> something else?
>>>>>>>
>>>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>>>> showed
>>>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>>>
>>>>>>>> Your latest patches fix the regression.
>>>>>>>>
>>>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>>>> texelFetch.fs".
>>>>>>>>
>>>>>>>> Kernel 3.14:
>>>>>>>> real 0m17.724s
>>>>>>>> user 0m41.905s
>>>>>>>> sys 0m11.299s
>>>>>>>>
>>>>>>>> The problematic commit checked out + your fixes (without the PTE
>>>>>>>> patch
>>>>>>>> I
>>>>>>>> think):
>>>>>>>> real 0m23.474s
>>>>>>>> user 1m1.008s
>>>>>>>> sys 0m13.812s
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>>>
>>>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy
>>>>>>>>>> <greg at chown.ath.cx>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>>>
>>>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>>>> being
>>>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>>>> wonder
>>>>>>>>>>> what's
>>>>>>>>>>> going on.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Allocation overhead?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unlikely, Xonotic just allocates a single page table at start,
>>>>>>>>> which
>>>>>>>>> then
>>>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>>>> address
>>>>>>>>> space
>>>>>>>>> and are done with it.
>>>>>>>>>
>>>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong
>>>>>>>>> here?
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Grigori
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I could reproduce the problem with xonotic and I think I've
>>>>>>>>>>>> found
>>>>>>>>>>>> the
>>>>>>>>>>>> issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the
>>>>>>>>>>>>>> behavior I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain
>>>>>>>>>>>>> always
>>>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>>>> placement
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>>>> matches
>>>>>>>>>>>>> that with the desired placement and should find that it
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to move the buffer (we should just test if this behavior
>>>>>>>>>>>>> really
>>>>>>>>>>>>> works
>>>>>>>>>>>>> as expected).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the
>>>>>>>>>>>>>> behavior I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Marek
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached
>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> complete shoot into the dark found by rereading the code,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test
>>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually I already wondered that it went so smooth
>>>>>>>>>>>>>>>> without any
>>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>>> page table without updating the page directory. Going to
>>>>>>>>>>>>>>>> dig
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes
>>>>>>>>>>>>>>>>>> hangs on
>>>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory
>>>>>>>>>>>>>>>>>> pressure and
>>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
More information about the dri-devel
mailing list