Regression on drm-tip

Borah, Chaitanya Kumar chaitanya.kumar.borah at intel.com
Fri Mar 14 09:04:15 UTC 2025



> -----Original Message-----
> From: Baolu Lu <baolu.lu at linux.intel.com>
> Sent: Thursday, March 13, 2025 7:53 PM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah at intel.com>
> Cc: baolu.lu at linux.intel.com; intel-gfx at lists.freedesktop.org; intel-
> xe at lists.freedesktop.org; iommu at lists.linux.dev
> Subject: Re: Regression on drm-tip
> 
> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> > Hello Lu,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip
> repository.
> >
> > ``````````````````````````````````````````````````````````````````````
> > ``````````` <4>[    2.856622] WARNING: possible circular locking
> > dependency detected <4>[    2.856631]
> > 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G          I <4>[
> > 2.856642] ------------------------------------------------------
> > <4>[    2.856650] swapper/0/1 is trying to acquire lock:
> > <4>[    2.856657] ffffffff8360ecc8
> > (iommu_probe_device_lock){+.+.}-{3:3}, at:
> > iommu_probe_device+0x1d/0x70 <4>[    2.856679]
> >                    but task is already holding lock:
> > <4>[    2.856686] ffff888102ab6fa8
> > (&device->physical_node_lock){+.+.}-{3:3}, at:
> > intel_iommu_init+0xea1/0x1220
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> > Details log can be found in [2].
> >
> > After bisecting the tree, the following patch [3] seems to be the
> > first "bad" commit
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> > Author: Lu Baolumailto:baolu.lu at linux.intel.com
> > Date:   Fri Feb 28 18:27:26 2025 +0800
> >
> >      iommu/vt-d: Fix suspicious RCU usage
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> >
> > We also verified that if we revert the patch the issue is not seen.
> >
> > Could you please check why the patch causes this regression and provide a
> fix if necessary?
> 
> Can you please take a quick test to check if the following fix works?
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index
> e540092d664d..06debeaec643 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
>                  if (iommu->irq || iommu->node != cpu_to_node(cpu))
>                          continue;
> 
> +               /*
> +                * Call dmar_alloc_hwirq() with dmar_global_lock held,
> +                * could cause possible lock race condition.
> +                */
> +               up_read(&dmar_global_lock);
>                  ret = dmar_set_interrupt(iommu);
> -
> +               down_read(&dmar_global_lock);
>                  if (ret) {
>                          pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
>                                 (unsigned long long)drhd->reg_base_addr, ret);
> 
> Thanks,
> baolu

We still see the issue with this change.

Regards

Chaitanya




More information about the Intel-xe mailing list