Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

Gabriel C nix.or.die at gmail.com
Wed Jun 6 15:44:59 UTC 2018


2018-06-06 17:03 GMT+02:00 Michel Dänzer <michel at daenzer.net>:
> On 2018-06-06 04:44 PM, Christian König wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig at amd.com>:
>>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig at amd.com>:
>>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>>
>>>>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>>
>>>>>>
>>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>>
>>>>>
>>>>> That has nothing TODO with the driver nor the original bug you
>>>>> reported. The
>>>>> problem is that SME is active and that is currently not supported at
>>>>> all
>>>>> with a that hardware.
>>>>
>>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>>> release ?
>>>>
>>>> SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou <david1.zhou at amd.com>
>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>
>>     drm/amdgpu: only enable swiotlb alloc when need v2
>>
>>     get the max io mapping address of system memory to see if it is over
>>     our card accessing range.
>>     v2: move checking later
>>
>>     Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>>     Reviewed-by: Monk Liu <monk.liu at amd.com>
>>     Reviewed-by: Christian König <christian.koenig at amd.com>
>>     Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).

I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.

There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.

>
>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list