[Intel-xe] [PATCH v2] drm/xe/mmio: update gt_count when probing multi-tile

Matthew Auld matthew.auld at intel.com
Tue Jul 11 09:15:54 UTC 2023


On 11/07/2023 07:54, Ofir Bitton wrote:
> On 05/07/2023 13:19, Matthew Auld wrote:
>> Hi,
>>
>> On 05/07/2023 10:50, Ofir Bitton wrote:
>>> On 26/06/2023 20:20, Matthew Auld wrote:
>>>> It looks like the single-tile PVC in CI dies during module load when
>>>> doing
>>>> the pcode init. From the logs we try to access the address
>>>> 0000000000138124 which doesn't map to anything, however 0x138124 also
>>>> looks to be the PCODE_MAILBOX register. So looks like the per-tile
>>>> mmio register mapping is NULL.
>>>>
>>>> During probe the tile count is potentially trimmed, since we don't know
>>>> the real count until we actually probe the device. This seems to be
>>>> the case for single-tile PVC or similar devices.  However it looks like
>>>> the gt_count is never adjusted to respect this updated tile count. As a
>>>> result when later doing some for_each_gt() loop, like we do for the
>>>> pcode, we can get back some GT that maps to some non-existent tile
>>>> which hasn't been properly set up, leading to crashes.
>>>>
>>>> Try to fix this by adjusting the gt_count after probing the tiles for
>>>> real.
>>>>
>>>> v2: Fix typo so it actually builds
>>>>
>>>> Closes:
>>>> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/383
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>> ---
>>>>     drivers/gpu/drm/xe/xe_mmio.c | 6 ++++++
>>>>     1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
>>>> index f1336803b915..8150bf6f3578 100644
>>>> --- a/drivers/gpu/drm/xe/xe_mmio.c
>>>> +++ b/drivers/gpu/drm/xe/xe_mmio.c
>>>> @@ -334,6 +334,12 @@ static void xe_mmio_probe_tiles(struct xe_device
>>>> *xe)
>>>>         adj_tile_count = xe->info.tile_count =
>>>>                 REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
>>>>
>>>> +     /*
>>>> +      * FIXME: Needs some work for standalone media, but should be
>>>> impossible
>>>> +      * with multi-tile for now.
>>>> +      */
>>>> +     xe->info.gt_count = xe->info.tile_count;
>>>> +
>>>>         drm_info(&xe->drm, "tile_count: %d, adj_tile_count %d\n",
>>>>                  xe->info.tile_count, adj_tile_count);
>>>>
>>>
>>> 'xe->info.gt_count' is getting incremented in 'xe_info_init' during
>>> probe, seems you must remove it from there, or else 'gt_count' will
>>> eventually be equal to x2 of the desired value.
>>
>> xe_info_init() looks to be called first. But this is very early during
>> probe and might not accurately represent the actual HW, so some of it
>> appears to be "placeholder" only, like with tile_count. Once we
>> eventually initialise enough of the MMIO stuff, we can actually probe
>> the HW to figure out the real number of tiles, as per
>> xe_mmio_probe_tiles() here. At this stage we might need to update
>> tile_count, but we missed also updating the gt_count, leading to
>> explosions if the real vs placeholder tile_count is ever different.
>>
> 
> I see, so do you see any reason to keep 'xe->info.gt_count'
> initialization in 'xe_info_init'? anyway you set it here and it is not
> used in between.

AFAICT xe_info_init() turns the static description (graphics_desc + 
media_desc) of the platform into an xe_info struct, which gets handed to 
the next phase of driver load, and in the case of tile_count it is just 
a placeholder since that information is not static. There looks to be 
various things like media_desc which can also influence the number of 
GTs. I'm not sure where that should instead be handled?

One idea is to very carefully probe the tile_count during 
xe_info_init(), sort of how handle_gmdid() does things. But I'm not 
certain if we are even allowed to do that this early.

Adjusting gt_count when we later adjust tile_count looked to be the 
minimal fix here, but happy to try to some other ideas.

> 
>>>
>>> --
>>> Ofir
> 


More information about the Intel-xe mailing list