[Intel-xe] [PATCH v2 08/31] drm/xe: VM LRU bulk move
Thomas Hellström
thomas.hellstrom at linux.intel.com
Fri May 12 09:03:30 UTC 2023
On 5/11/23 16:11, Matthew Brost wrote:
> On Thu, May 11, 2023 at 09:24:05AM +0200, Thomas Hellström wrote:
>> On 5/10/23 20:40, Matthew Brost wrote:
>>> On Wed, May 10, 2023 at 10:14:12AM +0200, Thomas Hellström wrote:
>>>> On 5/10/23 00:05, Matthew Brost wrote:
>>>>> On Tue, May 09, 2023 at 02:47:54PM +0200, Thomas Hellström wrote:
>>>>>> On 5/2/23 02:17, Matthew Brost wrote:
>>>>>>> Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
>>>>>>> LRU position on every exec.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/xe/xe_bo.c | 32 ++++++++++++++++++++++++++++----
>>>>>>> drivers/gpu/drm/xe/xe_bo.h | 4 ++--
>>>>>>> drivers/gpu/drm/xe/xe_dma_buf.c | 2 +-
>>>>>>> drivers/gpu/drm/xe/xe_exec.c | 6 ++++++
>>>>>>> drivers/gpu/drm/xe/xe_vm_types.h | 3 +++
>>>>>>> 5 files changed, 40 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> index 3ab404e33fae..da99ee53e7d7 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>> @@ -985,6 +985,23 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
>>>>>>> ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
>>>>>>> }
>>>>>>> +static void xe_gem_object_close(struct drm_gem_object *obj,
>>>>>>> + struct drm_file *file_priv)
>>>>>>> +{
>>>>>>> + struct xe_bo *bo = gem_to_xe_bo(obj);
>>>>>>> +
>>>>>>> + if (bo->vm && !xe_vm_no_dma_fences(bo->vm)) {
>>>>>> Is there a reason we don't use bulk moves for LR vms? Admittedly bumping LRU
>>>>>> doesn't make much sense when we support user-space command buffer chaining,
>>>>>> but I think we should be doing it on exec at least, no?
>>>>> Maybe you could make the argument for compute VMs, the preempt worker in
>>>>> that case should probably do a bulk move. I can change this if desired.
>>>> Yes, please.
>>>>> Fot a fault VM it makes no sense as the fault handler updates the LRU
>>>>> for individual BOs.
>>>> Yes that makes sense.
>>>>>>> + struct ww_acquire_ctx ww;
>>>>>>> +
>>>>>>> + XE_BUG_ON(!xe_bo_is_user(bo));
>>>>>> Also why can't we use this for kernel objects as well? At some point we want
>>>>>> to get to evictable page-table objects? Could we do this in the
>>>>>> release_notify() callback to cover all potential bos?
>>>>>>
>>>>> xe_gem_object_close is a user call, right? We can't call this on kernel
>>>>> BOs. This also could be outside the if statement.
>>>> Hmm, yes the question was can we stop doing this in xe_gem_object_close()
>>>> and instead do it in release_notify() to cover also kernel objects. Since
>>>> release_notify() is called just after individualizing dma_resv, it makes
>>>> sense to individualize also LRU at that point?
>>>>
>>> If we ever support moving kernel BOs, then yes. We need to do a lot of
>>> work to get there, with I'd rather leave this where is but I'll add a
>>> comment indicating if we want to support kernel BO eviction, this should
>>> be updated.
>>>
>>> Sound good?
>> Well, I can't see the motivation to have it in gem close? Are other drivers
>> doing that? Whether the object should be bulk moved or not is tied to
>> whether it's a vm private object or not and that is closely tied to whether
>> the reservation object is the vm resv or the object resv?
>>
> AMDGPU does via amdgpu_gem_object_close -> amdgpu_vm_bo_del, so yes.
>
> I also think I moved it here as before release_notify() I think there is
> an assert TTM for the bulk move being NULL, let me find that.
>
> 319 static void ttm_bo_release(struct kref *kref)
> 320 {
> 321 struct ttm_buffer_object *bo =
> 322 container_of(kref, struct ttm_buffer_object, kref);
> 323 struct ttm_device *bdev = bo->bdev;
> 324 int ret;
> 325
> 326 WARN_ON_ONCE(bo->pin_count);
> 327 WARN_ON_ONCE(bo->bulk_move);
Ugh, that's unfortunate.
In any case, it looks like if a client has multiple handles to the
object, the close() callback will be called multiple times, and the bulk
object released on the first, right?
The second best option would I guess then be to have it in
xe_gem_object_free(), I suppose, but the problem with that is the
potentially sleeping uninterruptible object lock :(.
If we could have it in release_notify() we already have the object lock.
So can we have it in xe_gem_object_free() for now and later perhaps ping
Christian about moving that WARN_ON_ONCE?
/Thomas
> Matt
>
>> /Thomas
>>
>>> Matt
>>>
>>>> /Thomas
>>>>
>>>>
>>>>> Matt
>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
More information about the Intel-xe
mailing list