[PATCH] drm/amdkfd: Initialize dqm earlier

Kuehling, Felix Felix.Kuehling at amd.com
Thu Jun 6 22:08:46 UTC 2019


On 2019-06-06 5:51 p.m., Zeng, Oak wrote:
> dqm is referenced in function kfd_toplogy_add_device.
> Move dqm initialization up to avoid NULL pointer reference.

This addresses a pretty unlikely race condition where someone looks at 
/sys/kernel/debug/kfd/hqds during the device initialization.

We add devices do the topology before their initialization is 
successfully completed. If it fails, we remove the device again. Having 
devices in the topology that are not completely initialized yet seems to 
be the real issue. A cleaner solution would move 
kfd_topoglogy_add_device to the end of kgd2kfd_device_init, so that we 
only add a device to the topology after they are successfully and 
completely initialized. Not sure if there are any dependencies in the 
init sequence that would be broken by this, though.

Regards,
   Felix


>
> Change-Id: Id6cb2541af129826b7621ceaa8e06e638c7bb122
> Signed-off-by: Oak Zeng <Oak.Zeng at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 9d1b026..e7e24fe 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -603,6 +603,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   	if (kfd->kfd2kgd->get_hive_id)
>   		kfd->hive_id = kfd->kfd2kgd->get_hive_id(kfd->kgd);
>   
> +	kfd->dqm = device_queue_manager_init(kfd);
> +	if (!kfd->dqm) {
> +		dev_err(kfd_device, "Error initializing queue manager\n");
> +		goto device_queue_manager_error;
> +	}
> +
>   	if (kfd_topology_add_device(kfd)) {
>   		dev_err(kfd_device, "Error adding device to topology\n");
>   		goto kfd_topology_add_device_error;
> @@ -613,12 +619,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   		goto kfd_interrupt_error;
>   	}
>   
> -	kfd->dqm = device_queue_manager_init(kfd);
> -	if (!kfd->dqm) {
> -		dev_err(kfd_device, "Error initializing queue manager\n");
> -		goto device_queue_manager_error;
> -	}
> -
>   	if (kfd_iommu_device_init(kfd)) {
>   		dev_err(kfd_device, "Error initializing iommuv2\n");
>   		goto device_iommu_error;
> @@ -642,12 +642,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   
>   kfd_resume_error:
>   device_iommu_error:
> -	device_queue_manager_uninit(kfd->dqm);
> -device_queue_manager_error:
>   	kfd_interrupt_exit(kfd);
>   kfd_interrupt_error:
>   	kfd_topology_remove_device(kfd);
>   kfd_topology_add_device_error:
> +	device_queue_manager_uninit(kfd->dqm);
> +device_queue_manager_error:
>   	kfd_doorbell_fini(kfd);
>   kfd_doorbell_error:
>   	kfd_gtt_sa_fini(kfd);


More information about the amd-gfx mailing list