AMD graphics performance regression in 4.15 and later

Michel Dänzer michel at daenzer.net
Fri Apr 20 14:47:57 UTC 2018


On 2018-04-11 11:37 AM, Christian König wrote:
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> <ckoenig.leichtzumerken at gmail.com>:
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>> Hi Christian,
>>>>
>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>>>> Feel free to comment since you have a better understanding of what's
>>>> going on.
>>>>
>>>> One last question: right now I'm running 4.15.0 with the "offending"
>>>> patch reverted. Is that safe to run or are there possible bad
>>>> interactions with other changes.
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test
>>> the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
> 
> The hardware is irrelevant for this. We need to know what software stack
> you use on top of it.
> 
> E.g. desktop environment/Mesa and DDX version etc...
> 
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could
>>> just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
> 
> That is for testing and not a permanent solution.
> 
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is
>> *different* storry.
> 
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I think we're still missing something here.

I'm currently running 4.16.2 + the DRM subsystem changes which are going
into 4.17 (so I have the changes Christian is referring to) with a
Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
Some observations:

Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
order of a minute, during which the kernel is spending most of one
core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
precise), called from ttm_alloc_new_pages.

At least in the case of Firefox, this happens due to Mesa internal BO
allocations for glTex(Sub)Image, so it's not obvious that Firefox is
doing something wrong.

I never noticed this before this week. Before, I was running 4.15.y +
DRM subsystem changes from 4.16. Maybe something has changed in core
code, trying harder to allocate huge pages.


Maybe TTM should only try to use any huge pages that happen to be
available, not spend any (/ "too much"?) additional effort trying to
free up huge pages?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list