[Intel-xe] [PATCH v2] drm/xe/mmio: update gt_count when probing multi-tile

Ofir Bitton obitton at habana.ai
Wed Jul 5 09:50:26 UTC 2023


On 26/06/2023 20:20, Matthew Auld wrote:
> It looks like the single-tile PVC in CI dies during module load when doing
> the pcode init. From the logs we try to access the address
> 0000000000138124 which doesn't map to anything, however 0x138124 also
> looks to be the PCODE_MAILBOX register. So looks like the per-tile
> mmio register mapping is NULL.
> 
> During probe the tile count is potentially trimmed, since we don't know
> the real count until we actually probe the device. This seems to be
> the case for single-tile PVC or similar devices.  However it looks like
> the gt_count is never adjusted to respect this updated tile count. As a
> result when later doing some for_each_gt() loop, like we do for the
> pcode, we can get back some GT that maps to some non-existent tile
> which hasn't been properly set up, leading to crashes.
> 
> Try to fix this by adjusting the gt_count after probing the tiles for
> real.
> 
> v2: Fix typo so it actually builds
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/383
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> ---
>   drivers/gpu/drm/xe/xe_mmio.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> index f1336803b915..8150bf6f3578 100644
> --- a/drivers/gpu/drm/xe/xe_mmio.c
> +++ b/drivers/gpu/drm/xe/xe_mmio.c
> @@ -334,6 +334,12 @@ static void xe_mmio_probe_tiles(struct xe_device *xe)
>   	adj_tile_count = xe->info.tile_count =
>   		REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
>   
> +	/*
> +	 * FIXME: Needs some work for standalone media, but should be impossible
> +	 * with multi-tile for now.
> +	 */
> +	xe->info.gt_count = xe->info.tile_count;
> +
>   	drm_info(&xe->drm, "tile_count: %d, adj_tile_count %d\n",
>   		 xe->info.tile_count, adj_tile_count);
>   

'xe->info.gt_count' is getting incremented in 'xe_info_init' during 
probe, seems you must remove it from there, or else 'gt_count' will 
eventually be equal to x2 of the desired value.

--
Ofir


More information about the Intel-xe mailing list