[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: stop using swiotlb (rev6)
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Jul 28 16:07:08 UTC 2022
On 28/07/2022 16:54, Robert Beckett wrote:
> On 28/07/2022 15:03, Tvrtko Ursulin wrote:
>>
>> On 28/07/2022 09:01, Patchwork wrote:
>>
>> [snip]
>>
>>> Possible regressions
>>>
>>> * igt at gem_mmap_offset@clear:
>>> o shard-iclb: PASS
>>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb6/igt@gem_mmap_offset@clear.html>
>>>
>>> -> INCOMPLETE
>>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106589v6/shard-iclb1/igt@gem_mmap_offset@clear.html>
>>>
>>
>> What was supposed to be a simple patch.. a storm of errors like:
>
> yeah, them's the breaks sometimes ....
>
>>
>> DMAR: ERROR: DMA PTE for vPFN 0x3d00000 already set (to 2fd7ff003
>> not 2fd7ff003)
>> ------------[ cut here ]------------
>> WARNING: CPU: 6 PID: 1254 at drivers/iommu/intel/iommu.c:2278
>> __domain_mapping.cold.93+0x32/0x39<>
>> Modules linked in: vgem drm_shmem_helper snd_hda_codec_hdmi
>> snd_hda_codec_realtek snd_hda_cod>
>> CPU: 6 PID: 1254 Comm: gem_mmap_offset Not tainted
>> 5.19.0-rc8-Patchwork_106589v6-g0e9c43d76a14+ #>
>> Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U
>> DDR4 SODIMM PD RVP TLC, BIOS >
>> RIP: 0010:__domain_mapping.cold.93+0x32/0x39
>> Code: fe 48 c7 c7 28 32 37 82 4c 89 5c 24 08 e8 e4 61 fd ff 8b 05 bf
>> 8e c9 00 4c 8b 5c 24 08 85 c>
>> RSP: 0000:ffffc9000037f9c0 EFLAGS: 00010202
>> RAX: 0000000000000004 RBX: ffff8881117b4000 RCX: 0000000000000001
>> RDX: 0000000000000000 RSI: ffffffff82320b25 RDI: 00000000ffffffff
>> RBP: 0000000000000001 R08: 0000000000000000 R09: c0000000ffff7fff
>> R10: 0000000000000001 R11: 00000000002fd7ff R12: 00000002fd7ff003
>> R13: 0000000000076c01 R14: ffff8881039ee800 R15: 0000000003d00000
>> FS: 00007f2863c1d700(0000) GS:ffff88849fd00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2692c53000 CR3: 000000011c440006 CR4: 0000000000770ee0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> intel_iommu_map_pages+0xb7/0xe0
>> __iommu_map+0xe0/0x310
>> __iommu_map_sg+0xa2/0x140
>> iommu_dma_map_sg+0x2ef/0x4e0
>> __dma_map_sg_attrs+0x64/0x70
>> dma_map_sg_attrs+0x5/0x20
>> i915_gem_gtt_prepare_pages+0x56/0x70 [i915]
>> shmem_get_pages+0xe3/0x360 [i915]
>> ____i915_gem_object_get_pages+0x32/0x100 [i915]
>> __i915_gem_object_get_pages+0x8d/0xa0 [i915]
>> vm_fault_gtt+0x3d0/0x940 [i915]
>> ? ptlock_alloc+0x15/0x40
>> ? rt_mutex_debug_task_free+0x91/0xa0
>> __do_fault+0x30/0x180
>> do_fault+0x1c4/0x4c0
>> __handle_mm_fault+0x615/0xbe0
>> handle_mm_fault+0x75/0x1c0
>> do_user_addr_fault+0x1e7/0x670
>> exc_page_fault+0x62/0x230
>> asm_exc_page_fault+0x22/0x30
>>
>> No idea. Maybe try CI kernel config on your Tigerlake?
>
> I have an idea of what could be happening:
>
> The warning is due to a pte already existing. We can see from the
> warning that it is the same value, which indicates that the same page
> has been mapped to the same iova before.
>
> This map shrink loop will keep mapping the same sg, shrinking if it
> fails to hopefully free up iova space.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/i915_gem_gtt.c?h=v5.19-rc8#n32
>
>
> If we now look at the intel iommu driver's mapping function:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/intel/iommu.c?h=v5.19-rc8#n2248
>
>
> If that -ENOMEM loop breaking return is hit (presumably running out of
> pte space, though I have not delved deeper), then it will return back up
> the stack, eventually returning 0 from dma_map_sg_attrs() indicating the
> error. This will cause a shrink and retry.
>
> The problem is that the iommu does not undo it's partial mapping on
> error. So the next time round, it will map the same page to the same
> address giving the same pte encoding, which would give the warning
> observed.
>
> I would need to get some time to try to repro and debug to confirm, but
> this looks like it might be exposing an iommu driver issue due to us
> changing our mapping patterns because the segment sizes are now different.
>
> I'll see if I can get some time allotted to debug it further, but for
> now, I don't have the bandwidth, so this may need to go on hold until I
> or someone else can get time to look in to it.
Yeah that's understandable. I also currently don't have any free
bandwidth unfortunately.
+ Christoph FYI, as per above, swiotlb API usage removal is currently a
bit stuck until we find someone with some spare time to debug this further.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list