[Intel-xe] [PATCH v2 3/3] drm/xe/bo: sync kernel fences for KMD buffers

Matthew Auld matthew.auld at intel.com
Mon Oct 30 09:01:01 UTC 2023


On 27/10/2023 15:17, Thomas Hellström wrote:
> 
> On 10/27/23 12:48, Matthew Auld wrote:
>> With things like pipelined evictions, VRAM pages can be marked as free
>> and yet still have some active kernel fences, with the idea that the
>> next caller to allocate the memory will respect them. However it looks
>> like we are missing synchronisation for KMD internal buffers, like
>> page-tables, lrc etc. For userspace objects we should already have the
>> required synchronisation for CPU access via the fault handler, and
>> likewise for GPU access when vm_binding them.
>>
>> To fix this synchronise against any kernel fences for all KMD objects at
>> creation. This should resolve some severe corruption seen during
>> evictions.
>>
>> v2 (Matt B):
>>    - Revamp the comment explaining this. Also mention why USAGE_KERNEL is
>>      correct here.
>>
>> Closes: ?
>> Testcase: igt at xe-evict-ccs
>> Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 61789c0e88fb..d8afcae0780f 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -1272,6 +1272,27 @@ struct xe_bo *__xe_bo_create_locked(struct 
>> xe_device *xe, struct xe_bo *bo,
>>       else
>>           ttm_bo_move_to_lru_tail_unlocked(&bo->ttm);
>> +    /*
>> +     * The VRAM pages underneath are potentially still being accessed 
>> by the
>> +     * GPU, as per async GPU clearing and async evictions. However 
>> TTM makes
>> +     * sure to add any corresponding move/clear fences into the objects
>> +     * dma-resv using the DMA_RESV_USAGE_KERNEL slot.
>> +     *
>> +     * For KMD internal buffers we don't care about GPU clearing, 
>> however we
>> +     * still need to handle async evictions, where the VRAM is still 
>> being
>> +     * accessed by the GPU. Most internal callers are not expecting 
>> this,
>> +     * since they are missing the required synchronisation before 
>> accessing
>> +     * the memory. To keep things simple just sync wait any kernel 
>> fences
>> +     * here, if the buffer is designated KMD internal.
>> +     *
>> +     * For normal userspace objects we should already have the required
>> +     * pipelining or sync waiting elsewhere, since we already have to 
>> deal
>> +     * with things like async GPU clearing.
>> +     */
>> +    if (type == ttm_bo_type_kernel)
>> +        dma_resv_wait_timeout(bo->ttm.base.resv, DMA_RESV_USAGE_KERNEL,
>> +                      false, MAX_SCHEDULE_TIMEOUT);
>> +
> 
> Oh, BTW, we should probably use "ctx->interruptible" instead of "false". 
> That will add a late point-of-failure, but won't be hit very often.

Ah right. Will fix.

> 
> /Thomas
> 
> 


More information about the Intel-xe mailing list