[Intel-xe] [PATCH v2] drm/xe/mmio: update gt_count when probing multi-tile

Matthew Auld matthew.auld at intel.com
Wed Jul 5 10:19:45 UTC 2023


Hi,

On 05/07/2023 10:50, Ofir Bitton wrote:
> On 26/06/2023 20:20, Matthew Auld wrote:
>> It looks like the single-tile PVC in CI dies during module load when doing
>> the pcode init. From the logs we try to access the address
>> 0000000000138124 which doesn't map to anything, however 0x138124 also
>> looks to be the PCODE_MAILBOX register. So looks like the per-tile
>> mmio register mapping is NULL.
>>
>> During probe the tile count is potentially trimmed, since we don't know
>> the real count until we actually probe the device. This seems to be
>> the case for single-tile PVC or similar devices.  However it looks like
>> the gt_count is never adjusted to respect this updated tile count. As a
>> result when later doing some for_each_gt() loop, like we do for the
>> pcode, we can get back some GT that maps to some non-existent tile
>> which hasn't been properly set up, leading to crashes.
>>
>> Try to fix this by adjusting the gt_count after probing the tiles for
>> real.
>>
>> v2: Fix typo so it actually builds
>>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/383
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> Cc: Matt Roper <matthew.d.roper at intel.com>
>> ---
>>    drivers/gpu/drm/xe/xe_mmio.c | 6 ++++++
>>    1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
>> index f1336803b915..8150bf6f3578 100644
>> --- a/drivers/gpu/drm/xe/xe_mmio.c
>> +++ b/drivers/gpu/drm/xe/xe_mmio.c
>> @@ -334,6 +334,12 @@ static void xe_mmio_probe_tiles(struct xe_device *xe)
>>        adj_tile_count = xe->info.tile_count =
>>                REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
>>
>> +     /*
>> +      * FIXME: Needs some work for standalone media, but should be impossible
>> +      * with multi-tile for now.
>> +      */
>> +     xe->info.gt_count = xe->info.tile_count;
>> +
>>        drm_info(&xe->drm, "tile_count: %d, adj_tile_count %d\n",
>>                 xe->info.tile_count, adj_tile_count);
>>
> 
> 'xe->info.gt_count' is getting incremented in 'xe_info_init' during
> probe, seems you must remove it from there, or else 'gt_count' will
> eventually be equal to x2 of the desired value.

xe_info_init() looks to be called first. But this is very early during 
probe and might not accurately represent the actual HW, so some of it 
appears to be "placeholder" only, like with tile_count. Once we 
eventually initialise enough of the MMIO stuff, we can actually probe 
the HW to figure out the real number of tiles, as per 
xe_mmio_probe_tiles() here. At this stage we might need to update 
tile_count, but we missed also updating the gt_count, leading to 
explosions if the real vs placeholder tile_count is ever different.

> 
> --
> Ofir


More information about the Intel-xe mailing list