[bug report] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
Dan Carpenter
dan.carpenter at linaro.org
Mon Dec 4 12:43:02 UTC 2023
Hello Stanley.Yang,
The patch b1338a8e71ac: "drm/amdgpu: Workaround to skip kiq ring test
during ras gpu recovery" from Oct 17, 2023 (linux-next), leads to the
following Smatch static checker warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:604 amdgpu_get_xgmi_hive()
warn: sleeping in atomic context
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
591 struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
592 {
593 struct amdgpu_hive_info *hive = NULL;
594 int ret;
595
596 if (!adev->gmc.xgmi.hive_id)
597 return NULL;
598
599 if (adev->hive) {
600 kobject_get(&adev->hive->kobj);
601 return adev->hive;
602 }
603
--> 604 mutex_lock(&xgmi_mutex);
^^^^^^^^^^^^^^^^^^^^^^^
Shhh.... The mutexes are sleeping.
605
606 list_for_each_entry(hive, &xgmi_hive_list, node) {
The caller is amdgpu_gfx_disable_kcq():
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
516 spin_lock(&kiq->ring_lock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Holding a spin lock.
517 if (amdgpu_ring_alloc(kiq_ring, kiq->pmf->unmap_queues_size *
518 adev->gfx.num_compute_rings)) {
519 spin_unlock(&kiq->ring_lock);
520 return -ENOMEM;
521 }
522
523 for (i = 0; i < adev->gfx.num_compute_rings; i++) {
524 j = i + xcc_id * adev->gfx.num_compute_rings;
525 kiq->pmf->kiq_unmap_queues(kiq_ring,
526 &adev->gfx.compute_ring[j],
527 RESET_QUEUES, 0, 0);
528 }
529
530 /**
531 * This is workaround: only skip kiq_ring test
532 * during ras recovery in suspend stage for gfx9.4.3
533 */
534 hive = amdgpu_get_xgmi_hive(adev);
^^^^^^^^^^^^^^^^^^^^^^^^^^
Can't call a sleeping function when holding a spin_lock.
535 if (hive) {
536 hive_ras_recovery = atomic_read(&hive->ras_recovery);
537 amdgpu_put_xgmi_hive(hive);
538 }
539
540 ras = amdgpu_ras_get_context(adev);
541 if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
542 ras && (atomic_read(&ras->in_recovery) || hive_ras_recovery)) {
543 spin_unlock(&kiq->ring_lock);
544 return 0;
545 }
546
547 if (kiq_ring->sched.ready && !adev->job_hang)
548 r = amdgpu_ring_test_helper(kiq_ring);
549 spin_unlock(&kiq->ring_lock);
regards,
dan carpenter
More information about the amd-gfx
mailing list