Regression on drm-tip

Baolu Lu baolu.lu at linux.intel.com
Thu Mar 13 14:23:23 UTC 2025


On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> Hello Lu,
> 
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
> 
> This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip repository.
> 
> `````````````````````````````````````````````````````````````````````````````````
> <4>[    2.856622] WARNING: possible circular locking dependency detected
> <4>[    2.856631] 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G          I
> <4>[    2.856642] ------------------------------------------------------
> <4>[    2.856650] swapper/0/1 is trying to acquire lock:
> <4>[    2.856657] ffffffff8360ecc8 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70
> <4>[    2.856679]
>                    but task is already holding lock:
> <4>[    2.856686] ffff888102ab6fa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xea1/0x1220
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
> 
> After bisecting the tree, the following patch [3] seems to be the first "bad" commit
> 
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> Author: Lu Baolumailto:baolu.lu at linux.intel.com
> Date:   Fri Feb 28 18:27:26 2025 +0800
> 
>      iommu/vt-d: Fix suspicious RCU usage
> 
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> 
> We also verified that if we revert the patch the issue is not seen.
> 
> Could you please check why the patch causes this regression and provide a fix if necessary?

Can you please take a quick test to check if the following fix works?

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index e540092d664d..06debeaec643 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
                 if (iommu->irq || iommu->node != cpu_to_node(cpu))
                         continue;

+               /*
+                * Call dmar_alloc_hwirq() with dmar_global_lock held,
+                * could cause possible lock race condition.
+                */
+               up_read(&dmar_global_lock);
                 ret = dmar_set_interrupt(iommu);
-
+               down_read(&dmar_global_lock);
                 if (ret) {
                         pr_err("DRHD %Lx: failed to enable fault, 
interrupt, ret %d\n",
                                (unsigned long long)drhd->reg_base_addr, 
ret);

Thanks,
baolu


More information about the Intel-xe mailing list