[PATCH 08/12] drm/xe: Flush GGTT writes after populating DPT

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Tue Mar 4 14:23:01 UTC 2025


On 28/02/2025 19:27, Rodrigo Vivi wrote:
> On Fri, Feb 21, 2025 at 10:17:27AM +0000, Tvrtko Ursulin wrote:
>> When DPT is placed in stolen it is populated using ioremap_wc() via GGTT.
>>
>> I915 has established that on modern platforms a small flush and delay is
>> required for those writes to reliably land so lets add the same logic
>> (simplified by removing impossible platforms) to xe as well.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> 
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> 
> on this as well...
> 
>> ---
>>   drivers/gpu/drm/xe/display/xe_fb_pin.c | 45 ++++++++++++++++++++++++++
>>   1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index 6c85e03dfd79..8f559b19e8ab 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -10,9 +10,11 @@
>>   #include "intel_fb.h"
>>   #include "intel_fb_pin.h"
>>   #include "intel_fbdev.h"
>> +#include "regs/xe_engine_regs.h"
>>   #include "xe_bo.h"
>>   #include "xe_device.h"
>>   #include "xe_ggtt.h"
>> +#include "xe_mmio.h"
>>   #include "xe_pm.h"
>>   
>>   static void
>> @@ -78,6 +80,46 @@ write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
>>   	*dpt_ofs = ALIGN(*dpt_ofs, 4096);
>>   }
>>   
>> +static void gt_flush_ggtt_writes(struct xe_gt *gt)
>> +{
>> +	if (!gt)
>> +		return;
>> +
>> +	xe_mmio_read32(&gt->mmio, RING_TAIL(RENDER_RING_BASE));
>> +}
>> +
>> +static void ggtt_flush_writes(struct xe_ggtt *ggtt)
>> +{
>> +	struct xe_device *xe = tile_to_xe(ggtt->tile);
>> +
>> +	/*
>> +	 * No actual flushing is required for the GTT write domain for reads
>> +	 * from the GTT domain. Writes to it "immediately" go to main memory
>> +	 * as far as we know, so there's no chipset flush. It also doesn't
>> +	 * land in the GPU render cache.
>> +	 *
>> +	 * However, we do have to enforce the order so that all writes through
>> +	 * the GTT land before any writes to the device, such as updates to
>> +	 * the GATT itself.
>> +	 *
>> +	 * We also have to wait a bit for the writes to land from the GTT.
>> +	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
>> +	 * timing. This issue has only been observed when switching quickly
>> +	 * between GTT writes and CPU reads from inside the kernel on recent hw,
>> +	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
>> +	 * system agents we cannot reproduce this behaviour, until Cannonlake
>> +	 * that was!).
>> +	 */
>> +
>> +	wmb();
>> +
>> +	if (xe_pm_runtime_get_if_active(xe)) {
>> +		gt_flush_ggtt_writes(ggtt->tile->primary_gt);
>> +		gt_flush_ggtt_writes(ggtt->tile->media_gt);
>> +		xe_pm_runtime_put(xe);
>> +	}
>> +}
>> +
>>   static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>>   			       const struct i915_gtt_view *view,
>>   			       struct i915_vma *vma,
>> @@ -161,6 +203,9 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>>   					  rot_info->plane[i].dst_stride);
>>   	}
>>   
>> +	if (dpt->vmap.is_iomem)
>> +		ggtt_flush_writes(tile0->mem.ggtt);

I think I should add && !xe_bo_is_vram to this condition. Since then 
writes are not going via GGTT. But I wait for more comments on the patch 
first.

Regards,

Tvrtko

>> +
>>   	vma->dpt = dpt;
>>   	vma->node = dpt->ggtt_node[tile0->id];
>>   	return 0;
>> -- 
>> 2.48.0
>>



More information about the Intel-xe mailing list