[PATCH] drm/amdgpu: make IB test synchronize with init for SRIOV
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Jun 29 08:18:24 UTC 2020
Am 29.06.20 um 09:11 schrieb Monk Liu:
> From: pengzhou <PengJu.Zhou at amd.com>
>
> issue:
> originally we kickoff IB test asynchronously with driver's init, thus
> the IB test may still running when the driver loading done (modprobe amdgpu done).
> if we shutdown VM immediately after amdgpu driver loaded then GPU may
> hang because the IB test is still running
>
> fix:
> make IB test synchronize with driver init thus it won't still running
> when we shutdown the VM.
We explicitly added the asynchronously IB test for SRIOV to make driver
load faster. Why is that now a problem?
And why would it help when the VM shuts down? We cancel/flush the test
during driver unload/suspend as well.
>
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++++++++++++++++++++++++-----
> 1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 457f5d2..4f54660 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3292,8 +3292,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> /* must succeed. */
> amdgpu_ras_resume(adev);
>
> - queue_delayed_work(system_wq, &adev->delayed_init_work,
> + if (amdgpu_sriov_vf(adev)) {
> + r = amdgpu_ib_ring_tests(adev);
> + if (r) {
> + DRM_ERROR("ib ring test failed (%d).\n", r);
> + return r;
> + }
> + } else {
> + queue_delayed_work(system_wq, &adev->delayed_init_work,
> msecs_to_jiffies(AMDGPU_RESUME_MS));
> + }
>
> r = sysfs_create_files(&adev->dev->kobj, amdgpu_dev_attributes);
> if (r) {
> @@ -3329,7 +3337,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> int r;
>
> DRM_INFO("amdgpu: finishing device.\n");
> - flush_delayed_work(&adev->delayed_init_work);
> + if (!amdgpu_sriov_vf(adev))
> + flush_delayed_work(&adev->delayed_init_work);
You can drop this change, flushing a work which was never scheduled is
harmless.
> adev->shutdown = true;
>
> /* make sure IB test finished before entering exclusive mode
> @@ -3425,7 +3434,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
> if (fbcon)
> amdgpu_fbdev_set_suspend(adev, 1);
>
> - cancel_delayed_work_sync(&adev->delayed_init_work);
> + if (!amdgpu_sriov_vf(adev))
> + cancel_delayed_work_sync(&adev->delayed_init_work);
>
> if (!amdgpu_device_has_dc_support(adev)) {
> /* turn off display hw */
> @@ -3528,8 +3538,16 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
> if (r)
> return r;
>
> - queue_delayed_work(system_wq, &adev->delayed_init_work,
> + if (amdgpu_sriov_vf(adev)) {
> + r = amdgpu_ib_ring_tests(adev);
> + if (r) {
> + DRM_ERROR("ib ring test failed (%d).\n", r);
> + return r;
> + }
> + } else {
> + queue_delayed_work(system_wq, &adev->delayed_init_work,
> msecs_to_jiffies(AMDGPU_RESUME_MS));
> + }
>
> if (!amdgpu_device_has_dc_support(adev)) {
> /* pin cursors */
> @@ -3554,7 +3572,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
> return r;
>
> /* Make sure IB tests flushed */
> - flush_delayed_work(&adev->delayed_init_work);
> + if (!amdgpu_sriov_vf(adev))
> + flush_delayed_work(&adev->delayed_init_work);
>
> /* blat the mode back in */
> if (fbcon) {
More information about the amd-gfx
mailing list