[Intel-xe] [PATCH 1/4] drm/xe: Fix MTL+ stolen memory mapping
Matthew Auld
matthew.auld at intel.com
Wed Jul 26 15:19:43 UTC 2023
On 26/07/2023 16:13, Lucas De Marchi wrote:
> On Tue, Jul 25, 2023 at 03:47:18PM -0700, Matt Roper wrote:
>> On Tue, Jul 25, 2023 at 01:28:33PM -0700, Lucas De Marchi wrote:
>>> Based on commit 8d8d062be6b9 ("drm/i915/mtl: Fix MTL stolen memory GGTT
>>> mapping"). For stolen on MTL and beyond, the address in the PTE is the
>>> offset from DSM base. While at it, update the comments explaining each
>>> part of the calculation.
>>>
>>> v2: Also change resource_is_stolen_vram() so the right PTE bit is set
>>
>> With a bit more testing locally I noticed that I'm seeing some warnings
>> and crashes on driver unload when display is enabled. I'm not 100% sure
>> they're tied to this, but it seems like they could be.
>>
>> The first warning is
>>
>> WARNING: CPU: 10 PID: 4710 at
>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:324 ttm_vram_mgr_fini+0x15e/0x170
>> [xe]
>>
>> which is
>>
>> WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>>
>> From there things go downhill --- circular locking detected, slab cache
>> problems, and KASAN use-after-free reports.
>
> I can only reproduce the warnings above, not the crashes, circular
> lockings etc. They don't seem related to this patch. More below.
>
>>
>> Without digging through the code in depth, my worry is that by just
>> marking this stolen memory as vram everywhere in the driver, we try to
>> return the framebuffer to the otherwise empty/uninitialized VRAM pool,
>> even though it isn't truly VRAM.
>>
>> Maybe it would be better to make a change with less side effects
>> elsewhere in the code. E.g., add a "resource_is_stolen_devmem" that
>> only gets called from the pagetable encoding functions to set the DM=1
>> bit and not anywhere else?
>
> I think it would make it clearer about the intent since the stolen
> memory is not actually vram, it's just accessed through the PCI BAR. I
> did typed this all into new patches for this series, but it I still
> reproduce the warnings above. They seem related to the small BAR
> addition that added the tracking for the visible portion of the BAR.
>
> For stolen, it's always 100% visible. Loading the driver without
> display (no warning on module removal):
>
> root at mtl: ~# cat /sys/kernel/debug/dri/0/stolen_mm
> use_type: 1
> use_tt: 0
> size: 62914560
> usage: 0
> default_page_size: 4KiB
> visible_avail: 60MiB
> visible_size: 60MiB
> chunk_size: 4KiB, total: 60MiB, free: 60MiB
> ...
>
> vs loading it with display:
>
> root at mtl # cat /sys/kernel/debug/dri/0/stolen_mm
> use_type: 1
> use_tt: 0
> size: 62914560
> usage: 25165824
> default_page_size: 4KiB
> visible_avail: 36MiB
> visible_size: 60MiB
> chunk_size: 4KiB, total: 60MiB, free: 36MiB
>
> From dmesg:
> [ 3026.756075] xe 0000:00:02.0: [drm:xe_ttm_stolen_mgr_init [xe]]
> Initialized stolen memory support with 62914560 bytes
> [ 3029.670666] xe 0000:00:02.0: [drm] Allocated fbdev into stolen
> [ 3030.009260] xe 0000:00:02.0: [drm:intel_fbc_update [xe]] reserved
> 16777216 bytes of contiguous stolen space for FBC, limit: 1
>
> So the allocations that remain on stolen seem to be the fbdev + FBC.
> I wonder if the fini order is correct wrt display and we are returning
> those before destroying the ttm mananger. Cc'ing Matt Auld and Maarten
> to check if they have additional clues. I will send the next version
> since I believe the approach without overriding is_vram is clearer and I
> have several other fixes/cleanups piling there.
Sounds like maybe:
https://patchwork.freedesktop.org/series/120310/
I have been cherry picking that on my local branches, otherwise module
unload is quite broken.
>
> I will try it on TGL or ADL-P to see if I can reproduce there too.
>
> Lucas De Marchi
>
>>
>> Or maybe these warnings are unrelated; I haven't tried unloading with
>> display on any other platforms recently, so maybe it's a pre-existing
>> problem.
>>
>>
>> Matt
>>
>>>
>>> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_bo.c | 3 ++-
>>> drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 15 +++++++++++++--
>>> 2 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index 10858fcf4f62..5379d8e5affb 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -61,7 +61,8 @@ bool mem_type_is_vram(u32 mem_type)
>>>
>>> static bool resource_is_stolen_vram(struct xe_device *xe, struct
>>> ttm_resource *res)
>>> {
>>> - return res->mem_type == XE_PL_STOLEN && IS_DGFX(xe);
>>> + return res->mem_type == XE_PL_STOLEN &&
>>> + (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1270);
>>> }
>>>
>>> static bool resource_is_vram(struct ttm_resource *res)
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> index 21ecc734f10a..271b3fba4129 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> @@ -94,11 +94,22 @@ static u32 detect_bar2_integrated(struct
>>> xe_device *xe, struct xe_ttm_stolen_mgr
>>>
>>> ggc = xe_mmio_read32(xe_root_mmio_gt(xe), GGC);
>>>
>>> - /* check GGMS, should be fixed 0x3 (8MB) */
>>> + /*
>>> + * Check GGMS: it should be fixed 0x3 (8MB), which corresponds
>>> to the
>>> + * GTT size
>>> + */
>>> if (drm_WARN_ON(&xe->drm, (ggc & GGMS_MASK) != GGMS_MASK))
>>> return 0;
>>>
>>> - mgr->stolen_base = mgr->io_base = pci_resource_start(pdev, 2) +
>>> SZ_8M;
>>> + /*
>>> + * Graphics >= 1270 uses the offset to the GSMBASE as address in
>>> the
>>> + * PTEs, together with the DM flag being set. Previously there
>>> was no
>>> + * such flag so the address was the io_base.
>>> + *
>>> + * DSMBASE = GSMBASE + 8MB
>>> + */
>>> + mgr->stolen_base = SZ_8M;
>>> + mgr->io_base = pci_resource_start(pdev, 2) + mgr->stolen_base;
>>>
>>> /* return valid GMS value, -EIO if invalid */
>>> gms = REG_FIELD_GET(GMS_MASK, ggc);
>>> --
>>> 2.40.1
>>>
>>
>> --
>> Matt Roper
>> Graphics Software Engineer
>> Linux GPU Platform Enablement
>> Intel Corporation
More information about the Intel-xe
mailing list