[PATCH 1/3] amdkfd: Don't clear *kfd2kgd on kfd_module_init

Oded Gabbay oded.gabbay at amd.com
Sun Dec 21 05:24:31 PST 2014



On 12/21/2014 03:06 PM, Oded Gabbay wrote:
>
>
> On 12/21/2014 02:19 PM, Christian König wrote:
>> Am 21.12.2014 um 12:34 schrieb Oded Gabbay:
>>>
>>>
>>> On 12/21/2014 01:27 PM, Christian König wrote:
>>>> Am 20.12.2014 um 21:46 schrieb Oded Gabbay:
>>>>> When amdkfd and radeon are compiled inside the kernel image (not as modules),
>>>>> radeon will load before amdkfd and will set *kfd2kgd to its interface
>>>>> structure. Therefore, we must not set *kfd2kgd to NULL when amdkfd is loaded
>>>>> because it will override radeon's initialization and cause kernel BUG.
>>>>>
>>>>> Signed-off-by: Oded Gabbay <oded.gabbay at amd.com>
>>>>
>>>> You should probably rather fix the dependency between the two modules to get an
>>>> determined load order instead of doing such nasty workarounds.
>>>>
>>>> Christian.
>>>
>>> The problem is that when modules are compiled inside the kernel, there is NO
>>> determined load order and there is no mechanism to enforce that. If there
>>> is/was such a mechanism, I would of course prefer to use it.
>>
>> There should be an determined order based on the symbol use, otherwise
>> initializing most of the kernel modules wouldn't work as expected. For example
>> radeon depends on the drm module must be loaded before radeon is loaded.
> There should be, but when the modules are compiled in, they are loaded based on
> link order only, if they are in the same group, and the groups are loaded by a
> pre-defined order.
> The groups are: pure, core, postcore, arch, subsys, fs, device (which represents
> all the modules) and late. See init.h
>
> So radeon, amdkfd and amd_iommu_v2 are all in device group, and in the group
> they are ordered by their link order.
>
> Yes, radeon loads after drm, because drm*.o are before radeon*.o in the
> Makefile. See
> http://stackoverflow.com/questions/5669647/linux-order-of-statically-linked-module-loading
>

So I tried moving amdkfd inside the Makefile before radeon, and that made amdkfd 
load before radeon did. This solves my first problem - order between amdkfd and 
radeon. However, amd_iommu_v2 doesn't belong to the drm Makefile, and I don't 
want to move iommu before gpu, so I don't have a solution for the order between 
amdkfd and amd_iommu_v2.

	Oded
>
>
>>
>>>
>>> Actually, I don't understand why the kernel doesn't enforce the order
>>> according to the use of exported symbols, like it does with modules.
>>
>> Yeah, that's indeed rather strange. There must be something in the amdkfd code
>> which broke that somehow.
> IMO, that's a far-fetched guess. Could you point to something more specific ?
>
>>
>> As far as I understand you the desired init order is radeon and amd_iommu_v2
>> first and then amdkfd, right?
> Actually no. The preferred order is amd_iommu_v2, amdkfd and radeon last. This
> is the order that happens when all three are built as modules. More accurately,
> radeon inits, but its init triggers amdkfd init, which triggers amd_iommu_v2
> init. So before radeon reaches its probe stage, all the modules were initialized.
>
> So what happens when you boot with radeon,
>> amd_iommu_v2 and amdkfd blacklisted for automatically load and only load amdkfd
>> manually?
> As said above, that's ok.
>>
>>> There will always be dependencies between kgd (radeon) and amdkfd and between
>>> amdkfd and amd_iommu_v2. I don't think I can eliminate those dependencies, not
>>> without a very complex solution. And the fact that this complex solution
>>> occurs only in a very specific use case (all modules compiled in), makes me
>>> less inclined to do that.
>>>
>>> So I don't see it as a "nasty workaround". I would call it just a "workaround"
>>> for a specific use case, which should be solved by a generic solution to the
>>> kernel enforcing load orders.
>>
>> The normal kernel module handling already should provide the correct init order,
>> so I would still call this a rather nasty workaround because we couldn't find
>> the underlying problem.
> Well, the normal kernel module handling doesn't work when all modules are
> compiled in. I'm not a huge expert on this issue so I had Joerg Roedel help me
> analyze this (thanks Joerg) and he agreed that there is no enforcement of order
> in this case.
>
>>
>> Christian.
>>
>>>
>>>     Oded
>>>>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_module.c | 5 ++---
>>>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>> index 95d5af1..236562f 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
>>>>> @@ -34,7 +34,7 @@
>>>>>   #define KFD_DRIVER_MINOR    7
>>>>>   #define KFD_DRIVER_PATCHLEVEL    0
>>>>> -const struct kfd2kgd_calls *kfd2kgd;
>>>>> +const struct kfd2kgd_calls *kfd2kgd = NULL;
>>>>>   static const struct kgd2kfd_calls kgd2kfd = {
>>>>>       .exit        = kgd2kfd_exit,
>>>>>       .probe        = kgd2kfd_probe,
>>>>> @@ -84,14 +84,13 @@ EXPORT_SYMBOL(kgd2kfd_init);
>>>>>   void kgd2kfd_exit(void)
>>>>>   {
>>>>> +    kfd2kgd = NULL;
>>>>>   }
>>>>>   static int __init kfd_module_init(void)
>>>>>   {
>>>>>       int err;
>>>>> -    kfd2kgd = NULL;
>>>>> -
>>>>>       /* Verify module parameters */
>>>>>       if ((sched_policy < KFD_SCHED_POLICY_HWS) ||
>>>>>           (sched_policy > KFD_SCHED_POLICY_NO_HWS)) {
>>>>
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the dri-devel mailing list