[Intel-xe] [PATCH v2] drm/xe/mmio: update gt_count when probing multi-tile

Matthew Auld matthew.auld at intel.com
Mon Jul 10 16:24:56 UTC 2023


On 05/07/2023 11:19, Matthew Auld wrote:
> Hi,
> 
> On 05/07/2023 10:50, Ofir Bitton wrote:
>> On 26/06/2023 20:20, Matthew Auld wrote:
>>> It looks like the single-tile PVC in CI dies during module load when 
>>> doing
>>> the pcode init. From the logs we try to access the address
>>> 0000000000138124 which doesn't map to anything, however 0x138124 also
>>> looks to be the PCODE_MAILBOX register. So looks like the per-tile
>>> mmio register mapping is NULL.
>>>
>>> During probe the tile count is potentially trimmed, since we don't know
>>> the real count until we actually probe the device. This seems to be
>>> the case for single-tile PVC or similar devices.  However it looks like
>>> the gt_count is never adjusted to respect this updated tile count. As a
>>> result when later doing some for_each_gt() loop, like we do for the
>>> pcode, we can get back some GT that maps to some non-existent tile
>>> which hasn't been properly set up, leading to crashes.
>>>
>>> Try to fix this by adjusting the gt_count after probing the tiles for
>>> real.
>>>
>>> v2: Fix typo so it actually builds
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/383
>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_mmio.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
>>> index f1336803b915..8150bf6f3578 100644
>>> --- a/drivers/gpu/drm/xe/xe_mmio.c
>>> +++ b/drivers/gpu/drm/xe/xe_mmio.c
>>> @@ -334,6 +334,12 @@ static void xe_mmio_probe_tiles(struct xe_device 
>>> *xe)
>>>        adj_tile_count = xe->info.tile_count =
>>>                REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
>>>
>>> +     /*
>>> +      * FIXME: Needs some work for standalone media, but should be 
>>> impossible
>>> +      * with multi-tile for now.
>>> +      */
>>> +     xe->info.gt_count = xe->info.tile_count;
>>> +
>>>        drm_info(&xe->drm, "tile_count: %d, adj_tile_count %d\n",
>>>                 xe->info.tile_count, adj_tile_count);
>>>
>>
>> 'xe->info.gt_count' is getting incremented in 'xe_info_init' during
>> probe, seems you must remove it from there, or else 'gt_count' will
>> eventually be equal to x2 of the desired value.
> 
> xe_info_init() looks to be called first. But this is very early during 
> probe and might not accurately represent the actual HW, so some of it 
> appears to be "placeholder" only, like with tile_count. Once we 
> eventually initialise enough of the MMIO stuff, we can actually probe 
> the HW to figure out the real number of tiles, as per 
> xe_mmio_probe_tiles() here. At this stage we might need to update 
> tile_count, but we missed also updating the gt_count, leading to 
> explosions if the real vs placeholder tile_count is ever different.

Any takers to give an r-b here?

> 
>>
>> -- 
>> Ofir


More information about the Intel-xe mailing list