[PATCH v2 2/2] drm/xe: Wait for migration job before unmapping pages
Lucas De Marchi
lucas.demarchi at intel.com
Tue Dec 10 15:38:54 UTC 2024
On Thu, Dec 05, 2024 at 03:33:46PM +0100, Nirmoy Das wrote:
>
>On 12/5/2024 1:40 PM, Matthew Auld wrote:
>> On 05/12/2024 12:02, Nirmoy Das wrote:
>>> There could be still migration job going on while doing
>>> xe_tt_unmap_sg() which could trigger GPU page faults. Fix this by
>>> waiting for the migration job to finish.
>>>
>>> v2: Use intr=false(Matt A)
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3466
>>> Fixes: 75521e8b56e8 ("drm/xe: Perform dma_map when moving system buffer objects to TT")
>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>> Cc: <stable at vger.kernel.org> # v6.11+
>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>
>> Ok, so this is something like ttm_bo_move_to_ghost() doing a pipeline move for tt -> system, but we then do xe_tt_unmap_sg() too early which tears down the IOMMU (if enabled) mappings whilst the job is in progress?
>Yes, this exactly what is happening for this issue.
>>
>> Maybe add some more info to the commit message?
>
>I will add more details.
Are you going to send a new version? Once this is fixed, please also
send a revert MR to the kconfig workaround
3940181b1bad @ gitlab.freedesktop.org/drm/xe/ci.git
Lucas De Marchi
>
>
>> I think this for sure fixes it. Just wondering if it's somehow possible to keep the mapping until the job is done, since all tt -> sys moves are now synced here?
>>
>> Unless Thomas has a better idea here,
>> Reviewed-by: Matthew Auld <matthew.auld at intel.com>
>
>
>Thanks,
>
>Nirmoy
>
>>
>>> ---
>>> drivers/gpu/drm/xe/xe_bo.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index b2aa368a23f8..c906a5529db0 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -857,8 +857,16 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
>>> out:
>>> if ((!ttm_bo->resource || ttm_bo->resource->mem_type == XE_PL_SYSTEM) &&
>>> - ttm_bo->ttm)
>>> + ttm_bo->ttm) {
>>> + long timeout = dma_resv_wait_timeout(ttm_bo->base.resv,
>>> + DMA_RESV_USAGE_BOOKKEEP,
>>> + false,
>>> + MAX_SCHEDULE_TIMEOUT);
>>> + if (timeout < 0)
>>> + ret = timeout;
>>> +
>>> xe_tt_unmap_sg(ttm_bo->ttm);
>>> + }
>>> return ret;
>>> }
>>
More information about the Intel-xe
mailing list