Regression on drm-tip
Baolu Lu
baolu.lu at linux.intel.com
Sun Mar 16 02:33:31 UTC 2025
On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>
>
>> -----Original Message-----
>> From: Baolu Lu <baolu.lu at linux.intel.com>
>> Sent: Thursday, March 13, 2025 7:53 PM
>> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah at intel.com>
>> Cc: baolu.lu at linux.intel.com; intel-gfx at lists.freedesktop.org; intel-
>> xe at lists.freedesktop.org; iommu at lists.linux.dev
>> Subject: Re: Regression on drm-tip
>>
>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>> Hello Lu,
>>>
>>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>>>
>>> This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip
>> repository.
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ``````````` <4>[ 2.856622] WARNING: possible circular locking
>>> dependency detected <4>[ 2.856631]
>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
>>> 2.856642] ------------------------------------------------------
>>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>> <4>[ 2.856657] ffffffff8360ecc8
>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
>>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>> but task is already holding lock:
>>> <4>[ 2.856686] ffff888102ab6fa8
>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
>>> intel_iommu_init+0xea1/0x1220
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````
>>> Details log can be found in [2].
>>>
>>> After bisecting the tree, the following patch [3] seems to be the
>>> first "bad" commit
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````````````````````````````
>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>> Author: Lu Baolumailto:baolu.lu at linux.intel.com
>>> Date: Fri Feb 28 18:27:26 2025 +0800
>>>
>>> iommu/vt-d: Fix suspicious RCU usage
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````````````````````````````
>>>
>>> We also verified that if we revert the patch the issue is not seen.
>>>
>>> Could you please check why the patch causes this regression and provide a
>> fix if necessary?
>>
>> Can you please take a quick test to check if the following fix works?
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index
>> e540092d664d..06debeaec643 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>> continue;
>>
>> + /*
>> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
>> + * could cause possible lock race condition.
>> + */
>> + up_read(&dmar_global_lock);
>> ret = dmar_set_interrupt(iommu);
>> -
>> + down_read(&dmar_global_lock);
>> if (ret) {
>> pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
>> (unsigned long long)drhd->reg_base_addr, ret);
>>
>> Thanks,
>> baolu
>
> We still see the issue with this change.
I am attempting to reproduce this issue with my MTL machine. I pulled
the test branch from:
https://anongit.freedesktop.org/git/drm-tip.git
and built the test kernel image using the configuration file from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
But I did not observe the lockdep splat mentioned above after booting.
Is there anything I might have missed?
Thanks,
baolu
More information about the Intel-xe
mailing list