[PATCH] drm/amdgpu: make IB test synchronize with init for SRIOV

Christian König ckoenig.leichtzumerken at gmail.com
Mon Jun 29 08:18:24 UTC 2020


Am 29.06.20 um 09:11 schrieb Monk Liu:
> From: pengzhou <PengJu.Zhou at amd.com>
>
> issue:
> originally we kickoff IB test asynchronously with driver's init, thus
> the IB test may still running when the driver loading done (modprobe amdgpu done).
> if we shutdown VM immediately after amdgpu driver loaded then GPU may
> hang because the IB test is still running
>
> fix:
> make IB test synchronize with driver init thus it won't still running
> when we shutdown the VM.

We explicitly added the asynchronously IB test for SRIOV to make driver 
load faster. Why is that now a problem?

And why would it help when the VM shuts down? We cancel/flush the test 
during driver unload/suspend as well.

>
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++++++++++++++++++++++++-----
>   1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 457f5d2..4f54660 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3292,8 +3292,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	/* must succeed. */
>   	amdgpu_ras_resume(adev);
>   
> -	queue_delayed_work(system_wq, &adev->delayed_init_work,
> +	if (amdgpu_sriov_vf(adev)) {
> +		r = amdgpu_ib_ring_tests(adev);
> +		if (r) {
> +			DRM_ERROR("ib ring test failed (%d).\n", r);
> +			return r;
> +		}
> +	} else {
> +		queue_delayed_work(system_wq, &adev->delayed_init_work,
>   			   msecs_to_jiffies(AMDGPU_RESUME_MS));
> +	}
>   
>   	r = sysfs_create_files(&adev->dev->kobj, amdgpu_dev_attributes);
>   	if (r) {
> @@ -3329,7 +3337,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>   	int r;
>   
>   	DRM_INFO("amdgpu: finishing device.\n");
> -	flush_delayed_work(&adev->delayed_init_work);
> +	if (!amdgpu_sriov_vf(adev))
> +		flush_delayed_work(&adev->delayed_init_work);

You can drop this change, flushing a work which was never scheduled is 
harmless.

>   	adev->shutdown = true;
>   
>   	/* make sure IB test finished before entering exclusive mode
> @@ -3425,7 +3434,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
>   	if (fbcon)
>   		amdgpu_fbdev_set_suspend(adev, 1);
>   
> -	cancel_delayed_work_sync(&adev->delayed_init_work);
> +	if (!amdgpu_sriov_vf(adev))
> +		cancel_delayed_work_sync(&adev->delayed_init_work);
>   
>   	if (!amdgpu_device_has_dc_support(adev)) {
>   		/* turn off display hw */
> @@ -3528,8 +3538,16 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
>   	if (r)
>   		return r;
>   
> -	queue_delayed_work(system_wq, &adev->delayed_init_work,
> +	if (amdgpu_sriov_vf(adev)) {
> +		r = amdgpu_ib_ring_tests(adev);
> +		if (r) {
> +			DRM_ERROR("ib ring test failed (%d).\n", r);
> +			return r;
> +		}
> +	} else {
> +		queue_delayed_work(system_wq, &adev->delayed_init_work,
>   			   msecs_to_jiffies(AMDGPU_RESUME_MS));
> +	}
>   
>   	if (!amdgpu_device_has_dc_support(adev)) {
>   		/* pin cursors */
> @@ -3554,7 +3572,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
>   		return r;
>   
>   	/* Make sure IB tests flushed */
> -	flush_delayed_work(&adev->delayed_init_work);
> +	if (!amdgpu_sriov_vf(adev))
> +		flush_delayed_work(&adev->delayed_init_work);
>   
>   	/* blat the mode back in */
>   	if (fbcon) {



More information about the amd-gfx mailing list