[PATCH] drm/radeon: deprecate and remove KFD interface

Michel Dänzer michel at daenzer.net
Wed Nov 29 14:54:25 UTC 2017


On 2017-11-29 03:40 PM, Oded Gabbay wrote:
> On Wed, Nov 29, 2017 at 2:31 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> On Wed, Nov 29, 2017 at 1:16 PM, Michel Dänzer <michel at daenzer.net> wrote:
>>> On 2017-11-01 09:31 AM, Oded Gabbay wrote:
>>>> ok, taken to -next.
>>>
>>> This change broke the radeon driver on my Kaveri laptop. The gdm login
>>> screen works, but logging into the GNOME on Xorg session quickly results
>>> in a GPU hang and associated badness, see the attached dmesg.
>>>
>>> Reverting this change on top of drm-next makes it work again.
>>>
>>> On a hunch, I've tried reverting commits 62a7b7fbd08e ("drm/radeon:
>>> reduce number of free VMIDs and pipes in KV") and 28b57b856b63
>>> ("drm/radeon/cik: Don't touch int of pipes 1-7"), but no luck.
>>>
>>> Any ideas for what else is missing?
>>>
>>> Note that the amdkfd driver isn't actually active anyway, because I'm
>>> disabling the IOMMU. Is it possible that it's still doing or triggering
>>> some needed HW setup before it bails in that case?
>>>
>>>
>>> P.S. Assuming we can fix this without reverting, maybe we could also
>>> remove rdev->grbm_idx_mutex again?
>>>
>>> --
>>> Earthling Michel Dänzer               |               http://www.amd.com
>>> Libre software enthusiast             |             Mesa and X developer
>>
>> Hi Michel,
>> Even without IOMMU, amdkfd will initialize the module and internal
>> structures per device, up to the point where it tries to register a
>> callback with the iommu driver.
>> If IOMMU is disabled, it will fail then with the following error
>> message (in dmesg): "error getting iommu info. is the iommu enabled?"
>>
>> Having said that, it doesn't initialize anything in the device H/W
>> itself, so I find this very weird.
>>
>> I looked at the patch itself again and I don't see anything suspicious.
>>
>> I'll try to resurrect my Kaveri machine to check this, but it will
>> take some time.
>>
>> Oded
> 
> Any chance that the increase of VMIDs from 8 to 16 somehow (although I
> don't know how) caused this problem ?
> The desktop gui also didn't work for me, but when I changed the VMID
> number back to 8 (in cik.c) the gui worked again.
> 
> Michel, could you try this as well ?

Yeah, that also occurred to me in the meantime, and I can confirm your
findings.

My guess right now is that it's related to cik_pcie_init_compute_vmid.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the amd-gfx mailing list