[PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload

Felix Kuehling felix.kuehling at amd.com
Tue Mar 5 16:07:03 UTC 2024


On 2024-03-04 19:20, Rehman, Ahmad wrote:
>
> [AMD Official Use Only - General]
>
>
> Hey,
>
>
> Due to mode-1 reset (pending_reset), the amdgpu_amdkfd_device_init 
> will not be called and hence adev->kfd.init_complete will not be set. 
> The function amdgpu_amdkfd_drm_client_create has condition:
> if (!adev->kfd.init_complete)
>               return 0;
> So, in probe function, when we return from device_init the KFD is not 
> initialized and amdgpu_amdkfd_drm_client_create returns without doing 
> anything.

I think your change could result in calling 
amdgpu_amdkfd_drm_client_create multiple times. IIRC, one purpose of 
moving the call to amdgpu_pci_probe was to ensure that it is only called 
once, because it only gets unregistered once when the driver is 
unloaded. Maybe it would be better to remove the if 
(!adev->kfd.init_complete) condition from 
amdgpu_amdkfd_drm_client_create. That way we would always create the 
client at probe and it would be ready when it's needed after the GPU reset.


There is a chance that the client would get created unnecessarily if KFD 
init never succeeds. But that should be rare, and it's not a big 
resource waste.


There were some comments on a previous code review, that creating the 
DRM client too early could cause problems. But I don't understand what 
that problem could be. As I understand it, the adev->kfd.client is just 
a place to put GEM handles for KFD BOs that we don't want to expose to 
user mode. I see no harm in creating this client too early or when it's 
not needed.


Regards,
   Felix


>
> Thanks,
> Ahmad
>
> ------------------------------------------------------------------------
> *From:* Kuehling, Felix <Felix.Kuehling at amd.com>
> *Sent:* Monday, March 4, 2024 6:39 PM
> *To:* Rehman, Ahmad <Ahmad.Rehman at amd.com>; 
> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> *Cc:* Wan, Gavin <Gavin.Wan at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: Init zone device and drm client 
> after mode-1 reset on reload
>
> On 2024-03-04 17:05, Ahmad Rehman wrote:
> > In passthrough environment, when amdgpu is reloaded after unload, mode-1
> > is triggered after initializing the necessary IPs, That init does not
> > include KFD, and KFD init waits until the reset is completed. KFD init
> > is called in the reset handler, but in this case, the zone device and
> > drm client is not initialized, causing app to create kernel panic.
> >
> > Signed-off-by: Ahmad Rehman <Ahmad.Rehman at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 15b188aaf681..80b9642f2bc4 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2479,8 +2479,11 @@ static void 
> amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
> >        }
> >        for (i = 0; i < mgpu_info.num_dgpu; i++) {
> >                adev = mgpu_info.gpu_ins[i].adev;
> > -             if (!adev->kfd.init_complete)
> > +             if (!adev->kfd.init_complete) {
> > +                     kgd2kfd_init_zone_device(adev);
> >                        amdgpu_amdkfd_device_init(adev);
> > + amdgpu_amdkfd_drm_client_create(adev);
> I don't see what's preventing the DRM client initialization in the
> reset-on-driver-load case. It only needs to be created once and that
> happens in amdgpu_pci_probe. Am I missing anything?
>
> Regards,
>    Felix
>
>
> > +             }
> >                amdgpu_ttm_set_buffer_funcs_status(adev, true);
> >        }
> >   }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240305/be945cbd/attachment.htm>


More information about the amd-gfx mailing list