[bug report] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

Dan Carpenter dan.carpenter at linaro.org
Mon Dec 4 12:43:02 UTC 2023


Hello Stanley.Yang,

The patch b1338a8e71ac: "drm/amdgpu: Workaround to skip kiq ring test
during ras gpu recovery" from Oct 17, 2023 (linux-next), leads to the
following Smatch static checker warning:

	drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:604 amdgpu_get_xgmi_hive()
	warn: sleeping in atomic context

drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
    591 struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
    592 {
    593         struct amdgpu_hive_info *hive = NULL;
    594         int ret;
    595 
    596         if (!adev->gmc.xgmi.hive_id)
    597                 return NULL;
    598 
    599         if (adev->hive) {
    600                 kobject_get(&adev->hive->kobj);
    601                 return adev->hive;
    602         }
    603 
--> 604         mutex_lock(&xgmi_mutex);
                ^^^^^^^^^^^^^^^^^^^^^^^
Shhh....  The mutexes are sleeping.

    605 
    606         list_for_each_entry(hive, &xgmi_hive_list, node)  {

The caller is amdgpu_gfx_disable_kcq():

drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
   516          spin_lock(&kiq->ring_lock);
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Holding a spin lock.

   517          if (amdgpu_ring_alloc(kiq_ring, kiq->pmf->unmap_queues_size *
   518                                          adev->gfx.num_compute_rings)) {
   519                  spin_unlock(&kiq->ring_lock);
   520                  return -ENOMEM;
   521          }
   522  
   523          for (i = 0; i < adev->gfx.num_compute_rings; i++) {
   524                  j = i + xcc_id * adev->gfx.num_compute_rings;
   525                  kiq->pmf->kiq_unmap_queues(kiq_ring,
   526                                             &adev->gfx.compute_ring[j],
   527                                             RESET_QUEUES, 0, 0);
   528          }
   529  
   530          /**
   531           * This is workaround: only skip kiq_ring test
   532           * during ras recovery in suspend stage for gfx9.4.3
   533           */
   534          hive = amdgpu_get_xgmi_hive(adev);
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
Can't call a sleeping function when holding a spin_lock.

   535          if (hive) {
   536                  hive_ras_recovery = atomic_read(&hive->ras_recovery);
   537                  amdgpu_put_xgmi_hive(hive);
   538          }
   539  
   540          ras = amdgpu_ras_get_context(adev);
   541          if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
   542                  ras && (atomic_read(&ras->in_recovery) || hive_ras_recovery)) {
   543                  spin_unlock(&kiq->ring_lock);
   544                  return 0;
   545          }
   546  
   547          if (kiq_ring->sched.ready && !adev->job_hang)
   548                  r = amdgpu_ring_test_helper(kiq_ring);
   549          spin_unlock(&kiq->ring_lock);

regards,
dan carpenter


More information about the amd-gfx mailing list