[Intel-xe] [PATCH] drm/xe/xe_migrate.c: Use DPA offset for page table entries.

Fri Sep 29 15:01:47 UTC 2023

>-----Original Message-----
>From: Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>
>Sent: Thursday, September 28, 2023 9:49 PM
>To: Kershner, David <david.kershner at intel.com>
>Cc: intel-xe at lists.freedesktop.org; Ruhl, Michael J <michael.j.ruhl at intel.com>;
>Fleck, John <john.fleck at intel.com>; Roper, Matthew D
><matthew.d.roper at intel.com>; Brost, Matthew <matthew.brost at intel.com>;
>Summers, Stuart <stuart.summers at intel.com>
>Subject: Re: [PATCH] drm/xe/xe_migrate.c: Use DPA offset for page table
>entries.
>
>On Thu, Sep 28, 2023 at 09:48:16AM -0400, David Kershner wrote:
>>Device Physical Address (DPA) is the starting offset device memory.
>>
>>Update xe_migrate identity map base PTE entries to start at dpa_base
>>instead of 0.
>>
>>The VM offset value should be 0 relative instead of DPA relative.
>>
>>Signed-off-by: David Kershner <david.kershner at intel.com>
>>---
>> drivers/gpu/drm/xe/xe_migrate.c | 19 +++++++++++--------
>> 1 file changed, 11 insertions(+), 8 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_migrate.c
>b/drivers/gpu/drm/xe/xe_migrate.c
>>index 9438f609d18b..848cdb6af2e6 100644
>>--- a/drivers/gpu/drm/xe/xe_migrate.c
>>+++ b/drivers/gpu/drm/xe/xe_migrate.c
>>@@ -114,8 +114,10 @@ static u64 xe_migrate_vm_addr(u64 slot, u32 level)
>> 	return (slot + 1ULL) << xe_pt_shift(level + 1);
>> }
>>
>>-static u64 xe_migrate_vram_ofs(u64 addr)
>>+static u64 xe_migrate_vram_ofs(u64 addr, struct xe_tile *tile)
>> {
>>+	if (tile)
>>+		addr -= tile->mem.vram.dpa_base;
>
>See below...
>
>> 	return addr + (256ULL << xe_pt_shift(2));
>> }
>>
>>@@ -149,7 +151,7 @@ static int xe_migrate_create_cleared_bo(struct
>xe_migrate *m, struct xe_vm *vm)
>>
>> 	xe_map_memset(xe, &m->cleared_bo->vmap, 0, 0x00, cleared_size);
>> 	vram_addr = xe_bo_addr(m->cleared_bo, 0, XE_PAGE_SIZE);
>>-	m->cleared_vram_ofs = xe_migrate_vram_ofs(vram_addr);
>>+	m->cleared_vram_ofs = xe_migrate_vram_ofs(vram_addr, tile);
>>
>> 	return 0;
>> }
>>@@ -223,12 +225,12 @@ static int xe_migrate_prepare_vm(struct xe_tile
>*tile, struct xe_migrate *m,
>> 	} else {
>> 		u64 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE);
>>
>>-		m->batch_base_ofs = xe_migrate_vram_ofs(batch_addr);
>>+		m->batch_base_ofs = xe_migrate_vram_ofs(batch_addr, tile);
>>
>> 		if (xe->info.supports_usm) {
>> 			batch = tile->primary_gt->usm.bb_pool->bo;
>> 			batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE);
>>-			m->usm_batch_base_ofs =
>xe_migrate_vram_ofs(batch_addr);
>>+			m->usm_batch_base_ofs =
>xe_migrate_vram_ofs(batch_addr, tile);
>> 		}
>> 	}
>>
>>@@ -266,7 +268,9 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile,
>struct xe_migrate *m,
>> 		 * Use 1GB pages, it shouldn't matter the physical amount of
>> 		 * vram is less, when we don't access it.
>> 		 */
>>-		for (pos = 0; pos < xe->mem.vram.actual_physical_size; pos +=
>SZ_1G, ofs += 8)
>>+		for (pos = tile->mem.vram.dpa_base;
>>+		     pos < tile->mem.vram.actual_physical_size + tile-
>>mem.vram.dpa_base;
>>+		     pos += SZ_1G, ofs += 8)
>> 			xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags);
>
>I think this is incorrect.
>We need to map the whole device LMEM, no just the tile LMEM for each tile.
>This is because we should be able to move the bo from one tile to another.

My understanding, is that the page table is created for the tile (there is one per tile),
to migrate data to and from the tile.

How is this different from moving from sysmem to vram? Where is the PTE info for 
the sysmem side of this?

If this is for a general purpose page table, should this be created for the device,
rather than having two page table with the same info?

I see that the only tests that use this are sysmem -> device and device -> sysmem.

Is there a use case test that shows a migration from one tile to the other?  Would like
to test this.

>And the identity map I think should start from device dpa (xe->mem.vram.dpa_base)
>instead of 0-based. The xe->mem.vram.dpa_base gets updated based on whats
>is >programed into PKG_ADDR_BASE.

This makes sense if there is only one table.  Since the created for each tile, it will
have some over head.

For instance, the offset for the VMs are currently need the tile offset included.  The
new code does not need that.  If we use the identity mapping with the full space we
will have to break the tile offset out as a separate value to keep track  (will need to be
added per tile..).

Thoughts?

Mike

>Niranjana
>
>> 	}
>>
>>@@ -441,8 +445,7 @@ static u32 pte_update_size(struct xe_migrate *m,
>> 		cmds += cmd_size;
>> 	} else {
>> 		/* Offset into identity map. */
>>-		*L0_ofs = xe_migrate_vram_ofs(cur->start +
>>-					      vram_region_gpu_offset(res));
>>+		*L0_ofs = xe_migrate_vram_ofs(cur->start, NULL);
>> 		cmds += cmd_size;
>> 	}
>>
>>@@ -1027,7 +1030,7 @@ static void write_pgtable(struct xe_tile *tile, struct
>xe_bb *bb, u64 ppgtt_ofs,
>> 	xe_tile_assert(tile, update->qwords <= 0x1ff);
>> 	if (!ppgtt_ofs) {
>> 		ppgtt_ofs = xe_migrate_vram_ofs(xe_bo_addr(update->pt_bo,
>0,
>>-							   XE_PAGE_SIZE));
>>+							   XE_PAGE_SIZE), tile);
>> 	}
>>
>> 	do {
>>--
>>2.35.1
>>