[PATCH] drm/xe/bmg: fix compressed VRAM handling
Matthew Auld
matthew.auld at intel.com
Fri Jun 6 11:28:28 UTC 2025
On 04/06/2025 19:21, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Matthew Auld
> Sent: Wednesday, June 4, 2025 11:15 AM
> To: intel-xe at lists.freedesktop.org
> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>; Thomas Hellström <thomas.hellstrom at linux.intel.com>; Jahagirdar, Akshata <akshata.jahagirdar at intel.com>; stable at vger.kernel.org
> Subject: [PATCH] drm/xe/bmg: fix compressed VRAM handling
>>
>> There looks to be an issue in our compression handling when the BO pages
>> are very fragmented, where we choose to skip the identity map and
>> instead fall back to emitting the PTEs by hand when migrating memory,
>> such that we can hopefully do more work per blit operation. However in
>> such a case we need to ensure the src PTEs are correctly tagged with a
>> compression enabled PAT index on dgpu xe2+, otherwise the copy will
>> simply treat the src memory as uncompressed, leading to corruption if
>> the memory was compressed by the user.
>>
>> To fix this it looks like we can pass use_comp_pat into emit_pte() on
>> the src side.
>
> It would be better if we had more confidence here beyond "it looks like"
> (maybe just drop that part) and "There looks to be" (maybe "There is" instead),
> but if we're not comfortable making definitive statements about our compression
> handling, then I won't block this on some minor passive voice issues.
Yeah, this was only really based on code inspection, so unclear if this
was even a real issue, or whether this is even related to the user
report. But once more certain of either, will update the commit message.
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
Thanks.
> -Jonathan Cavitt
>
>>
>> There are reports of VRAM corruption in some heavy user workloads, which
>> might be related: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4495
>>
>> Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Cc: Akshata Jahagirdar <akshata.jahagirdar at intel.com>
>> Cc: <stable at vger.kernel.org> # v6.12+
>> ---
>> drivers/gpu/drm/xe/xe_migrate.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index 8f8e9fdfb2a8..16788ecf924a 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -863,7 +863,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
>> if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it))
>> xe_res_next(&src_it, src_L0);
>> else
>> - emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs,
>> + emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs || use_comp_pat,
>> &src_it, src_L0, src);
>>
>> if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it))
>> --
>> 2.49.0
>>
>>
More information about the Intel-xe
mailing list