[Intel-xe] [PATCH v2] drm/xe/mmio: update gt_count when probing multi-tile

Ofir Bitton obitton at habana.ai
Tue Jul 11 06:54:28 UTC 2023


On 05/07/2023 13:19, Matthew Auld wrote:
> Hi,
>
> On 05/07/2023 10:50, Ofir Bitton wrote:
>> On 26/06/2023 20:20, Matthew Auld wrote:
>>> It looks like the single-tile PVC in CI dies during module load when
>>> doing
>>> the pcode init. From the logs we try to access the address
>>> 0000000000138124 which doesn't map to anything, however 0x138124 also
>>> looks to be the PCODE_MAILBOX register. So looks like the per-tile
>>> mmio register mapping is NULL.
>>>
>>> During probe the tile count is potentially trimmed, since we don't know
>>> the real count until we actually probe the device. This seems to be
>>> the case for single-tile PVC or similar devices.  However it looks like
>>> the gt_count is never adjusted to respect this updated tile count. As a
>>> result when later doing some for_each_gt() loop, like we do for the
>>> pcode, we can get back some GT that maps to some non-existent tile
>>> which hasn't been properly set up, leading to crashes.
>>>
>>> Try to fix this by adjusting the gt_count after probing the tiles for
>>> real.
>>>
>>> v2: Fix typo so it actually builds
>>>
>>> Closes:
>>> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/383
>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_mmio.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
>>> index f1336803b915..8150bf6f3578 100644
>>> --- a/drivers/gpu/drm/xe/xe_mmio.c
>>> +++ b/drivers/gpu/drm/xe/xe_mmio.c
>>> @@ -334,6 +334,12 @@ static void xe_mmio_probe_tiles(struct xe_device
>>> *xe)
>>>        adj_tile_count = xe->info.tile_count =
>>>                REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
>>>
>>> +     /*
>>> +      * FIXME: Needs some work for standalone media, but should be
>>> impossible
>>> +      * with multi-tile for now.
>>> +      */
>>> +     xe->info.gt_count = xe->info.tile_count;
>>> +
>>>        drm_info(&xe->drm, "tile_count: %d, adj_tile_count %d\n",
>>>                 xe->info.tile_count, adj_tile_count);
>>>
>>
>> 'xe->info.gt_count' is getting incremented in 'xe_info_init' during
>> probe, seems you must remove it from there, or else 'gt_count' will
>> eventually be equal to x2 of the desired value.
>
> xe_info_init() looks to be called first. But this is very early during
> probe and might not accurately represent the actual HW, so some of it
> appears to be "placeholder" only, like with tile_count. Once we
> eventually initialise enough of the MMIO stuff, we can actually probe
> the HW to figure out the real number of tiles, as per
> xe_mmio_probe_tiles() here. At this stage we might need to update
> tile_count, but we missed also updating the gt_count, leading to
> explosions if the real vs placeholder tile_count is ever different.
>

I see, so do you see any reason to keep 'xe->info.gt_count'
initialization in 'xe_info_init'? anyway you set it here and it is not
used in between.

>>
>> --
>> Ofir



More information about the Intel-xe mailing list