Regression on drm-tip
Lucas De Marchi
lucas.demarchi at intel.com
Sat Mar 22 20:59:44 UTC 2025
On Mon, Mar 17, 2025 at 12:04:40PM +0800, Baolu Lu wrote:
>On 3/16/25 18:01, Borah, Chaitanya Kumar wrote:
>>
>>>-----Original Message-----
>>>From: Baolu Lu<baolu.lu at linux.intel.com>
>>>Sent: Sunday, March 16, 2025 1:33 PM
>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah at intel.com>
>>>Cc:intel-gfx at lists.freedesktop.org;intel-xe at lists.freedesktop.org;
>>>iommu at lists.linux.dev; Kurmi, Suresh Kumar
>>><suresh.kumar.kurmi at intel.com>; Saarinen, Jani<jani.saarinen at intel.com>;
>>>De Marchi, Lucas<lucas.demarchi at intel.com>
>>>Subject: Re: Regression on drm-tip
>>>
>>>On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
>>>>>-----Original Message-----
>>>>>From: Baolu Lu<baolu.lu at linux.intel.com>
>>>>>Sent: Sunday, March 16, 2025 8:04 AM
>>>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah at intel.com>
>>>>>Cc:intel-gfx at lists.freedesktop.org;intel-xe at lists.freedesktop.org;
>>>>>iommu at lists.linux.dev
>>>>>Subject: Re: Regression on drm-tip
>>>>>
>>>>>On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>>>>>>>-----Original Message-----
>>>>>>>From: Baolu Lu<baolu.lu at linux.intel.com>
>>>>>>>Sent: Thursday, March 13, 2025 7:53 PM
>>>>>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah at intel.com>
>>>>>>>Cc:baolu.lu at linux.intel.com;intel-gfx at lists.freedesktop.org; intel-
>>>>>>>xe at lists.freedesktop.org;iommu at lists.linux.dev
>>>>>>>Subject: Re: Regression on drm-tip
>>>>>>>
>>>>>>>On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>>>>>>>Hello Lu,
>>>>>>>>
>>>>>>>>Hope you are doing well. I am Chaitanya from the linux graphics
>>>>>>>>team in
>>>>>>>Intel.
>>>>>>>>This mail is regarding a regression we are seeing in our CI
>>>>>>>>runs[1] on drm-tip
>>>>>>>repository.
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>`` `` ``````````` <4>[ 2.856622] WARNING: possible circular
>>>>>>>>locking dependency detected <4>[ 2.856631]
>>>>>>>>6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
>>>>>>>><4>[ 2.856642]
>>>>>>>>------------------------------------------------------
>>>>>>>><4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>>>>>>><4>[ 2.856657] ffffffff8360ecc8
>>>>>>>>(iommu_probe_device_lock){+.+.}-{3:3}, at:
>>>>>>>>iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>>>>>>> but task is already holding lock:
>>>>>>>><4>[ 2.856686] ffff888102ab6fa8
>>>>>>>>(&device->physical_node_lock){+.+.}-{3:3}, at:
>>>>>>>>intel_iommu_init+0xea1/0x1220
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````
>>>>>>>>Details log can be found in [2].
>>>>>>>>
>>>>>>>>After bisecting the tree, the following patch [3] seems to be the
>>>>>>>>first "bad" commit
>>>>>>>>
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````````````````````````````
>>>>>>>>commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>>>>>>>Author:LuBaolumailto:baolu.lu at linux.intel.com
>>>>>>>>Date: Fri Feb 28 18:27:26 2025 +0800
>>>>>>>>
>>>>>>>> iommu/vt-d: Fix suspicious RCU usage
>>>>>>>>
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````````````````````````````
>>>>>>>>
>>>>>>>>We also verified that if we revert the patch the issue is not seen.
>>>>>>>>
>>>>>>>>Could you please check why the patch causes this regression and
>>>>>>>>provide a
>>>>>>>fix if necessary?
>>>>>>>
>>>>>>>Can you please take a quick test to check if the following fix works?
>>>>>>>
>>>>>>>diff --git a/drivers/iommu/intel/dmar.c
>>>>>>>b/drivers/iommu/intel/dmar.c index
>>>>>>>e540092d664d..06debeaec643 100644
>>>>>>>--- a/drivers/iommu/intel/dmar.c
>>>>>>>+++ b/drivers/iommu/intel/dmar.c
>>>>>>>@@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
>>>>>cpu)
>>>>>>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>>>>>>> continue;
>>>>>>>
>>>>>>>+ /*
>>>>>>>+ * Call dmar_alloc_hwirq() with dmar_global_lock held,
>>>>>>>+ * could cause possible lock race condition.
>>>>>>>+ */
>>>>>>>+ up_read(&dmar_global_lock);
>>>>>>> ret = dmar_set_interrupt(iommu);
>>>>>>>-
>>>>>>>+ down_read(&dmar_global_lock);
>>>>>>> if (ret) {
>>>>>>> pr_err("DRHD %Lx: failed to enable
>>>>>>>fault, interrupt, ret
>>>>>%d\n",
>>>>>>> (unsigned long
>>>>>>>long)drhd->reg_base_addr, ret);
>>>>>>>
>>>>>>>Thanks,
>>>>>>>baolu
>>>>>>We still see the issue with this change.
>>>>>I am attempting to reproduce this issue with my MTL machine. I pulled
>>>>>the test branch from:
>>>>>
>>>>>https://anongit.freedesktop.org/git/drm-tip.git
>>>>>
>>>>>and built the test kernel image using the configuration file from:
>>>>>
>>>>>https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>>>>>
>>>>>But I did not observe the lockdep splat mentioned above after booting.
>>>>>
>>>>>Is there anything I might have missed?
>>>>>
>>>>+Suresh, Jani, Lucas
>>>>
>>>>We are seeing this only the skykale and kabylake on our CI runs.
>>>If so, will below change make any difference?
>>>
>>>diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>index 85aa66ef4d61..ec2f385ae25b 100644
>>>--- a/drivers/iommu/intel/iommu.c
>>>+++ b/drivers/iommu/intel/iommu.c
>>>@@ -3049,6 +3049,7 @@ static int __init
>>>probe_acpi_namespace_devices(void)
>>> if (dev->bus != &acpi_bus_type)
>>> continue;
>>>
>>>+ up_read(&dmar_global_lock);
>>> adev = to_acpi_device(dev);
>>> mutex_lock(&adev->physical_node_lock);
>>> list_for_each_entry(pn, @@ -3058,6 +3059,7 @@ static int __init
>>>probe_acpi_namespace_devices(void)
>>> break;
>>> }
>>> mutex_unlock(&adev->physical_node_lock);
>>>+ down_read(&dmar_global_lock);
>>>
>>> if (ret)
>>> return ret;
>>>
>>Thank you for the change. This seems to be working. Can we expect a fix patch soon?
>
>Sure. I have posted a fix patch here,
>
>https://lore.kernel.org/linux-iommu/20250317035714.1041549-1-baolu.lu@linux.intel.com/
Thanks. FWIW I added this patch to our test branch in CI and the issue
is indeed not reproducing anymore.
Lucas De Marchi
More information about the Intel-xe
mailing list