[PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf mode
Wang, Kevin(Yang)
Kevin1.Wang at amd.com
Thu May 13 16:23:10 UTC 2021
[AMD Official Use Only - Internal Distribution Only]
thanks @Kuehling, Felix<mailto:Felix.Kuehling at amd.com>,
I have also noticed this problem, in the multi-GPU environment, there is no working well.
Best Regards,
Kevin
________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Friday, May 14, 2021 12:01 AM
To: Wang, Kevin(Yang) <Kevin1.Wang at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Min, Frank <Frank.Min at amd.com>
Subject: Re: [PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf mode
This won't work. the kfd_debugfs directory is system-wide. So you cannot
have a per-GPU criteria for creating it. You may have one GPU that
probes successfully, another that fails. You still need the debugfs. If
you have multiple GPUs probing successfully, you only want to create the
debugfs node once.
The hang_hws file requires writing a GPU-ID to it. So if a card doesn't
probe it won't have a GPU ID, so you won't be able to hang that card
through the hang_hws interface. So there is no need to hide the file
altogether.
Can you explain why hang_hws should be disabled for VFs?
Thanks,
Felix
Am 2021-05-13 um 4:08 a.m. schrieb Kevin Wang:
> v1:
> the kfd debugfs node is rely on kgd2kfd probe success,
> if not, the kfd_debugfs should not be created,
> and the node of "hang_hws" should be disabled on vf mode.
>
> v2:
> also move kfd_debugfs_fini() into kgd2kfd_device_exit() function.
>
> 1. move kfd_debugfs_init() function into kgd2kfd_probe() function.
> 2. disable "hang_hws" debugfs node on vf mode.
>
> Signed-off-by: Kevin Wang <kevin1.wang at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ++++---
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 +++
> drivers/gpu/drm/amd/amdkfd/kfd_module.c | 3 ---
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 ++--
> 4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> index 673d5e34f213..f9a81f34d09e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> @@ -88,7 +88,7 @@ static const struct file_operations kfd_debugfs_hang_hws_fops = {
> .release = single_release,
> };
>
> -void kfd_debugfs_init(void)
> +void kfd_debugfs_init(bool is_vf)
> {
> debugfs_root = debugfs_create_dir("kfd", NULL);
>
> @@ -98,8 +98,9 @@ void kfd_debugfs_init(void)
> kfd_debugfs_hqds_by_device, &kfd_debugfs_fops);
> debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
> kfd_debugfs_rls_by_device, &kfd_debugfs_fops);
> - debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> - kfd_debugfs_hang_hws_read, &kfd_debugfs_hang_hws_fops);
> + if (!is_vf)
> + debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> + kfd_debugfs_hang_hws_read, &kfd_debugfs_hang_hws_fops);
> }
>
> void kfd_debugfs_fini(void)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index dedb8e33b953..aa9154a8410f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -649,6 +649,8 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>
> ida_init(&kfd->doorbell_ida);
>
> + kfd_debugfs_init(vf);
> +
> return kfd;
> }
>
> @@ -884,6 +886,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
> amdgpu_amdkfd_free_gws(kfd->kgd, kfd->gws);
> }
>
> + kfd_debugfs_fini();
> kfree(kfd);
> }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 5e90fe642192..6b9f735c55ea 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -61,8 +61,6 @@ static int kfd_init(void)
> */
> kfd_procfs_init();
>
> - kfd_debugfs_init();
> -
> return 0;
>
> err_create_wq:
> @@ -76,7 +74,6 @@ static int kfd_init(void)
>
> static void kfd_exit(void)
> {
> - kfd_debugfs_fini();
> kfd_process_destroy_wq();
> kfd_procfs_shutdown();
> kfd_topology_shutdown();
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index daa9d47514c6..f3ddd8c5b11e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1174,7 +1174,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> /* Debugfs */
> #if defined(CONFIG_DEBUG_FS)
>
> -void kfd_debugfs_init(void);
> +void kfd_debugfs_init(bool is_vf);
> void kfd_debugfs_fini(void);
> int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data);
> int pqm_debugfs_mqds(struct seq_file *m, void *data);
> @@ -1189,7 +1189,7 @@ int dqm_debugfs_execute_queues(struct device_queue_manager *dqm);
>
> #else
>
> -static inline void kfd_debugfs_init(void) {}
> +static inline void kfd_debugfs_init(bool is_vf) {}
> static inline void kfd_debugfs_fini(void) {}
>
> #endif
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210513/e4e52196/attachment-0001.htm>
More information about the amd-gfx
mailing list