[PATCH] ttm: wait mem space if user allow while gpu busy

zhoucm1 zhoucm1 at amd.com
Wed Apr 24 08:11:40 UTC 2019



On 2019年04月24日 16:07, Christian König wrote:
> This is used in a work item, so you don't need to check for signals.
will remove.
>
> And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.

> or otherwise you can run into an endless loop.
I already add a timeout for that.

-David
>
> Christian.
>
> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>
>> how about new attached?
>>
>>
>> -David
>>
>>
>> On 2019年04月24日 15:30, Christian König wrote:
>>> That would change the semantics of ttm_bo_mem_space() and so could 
>>> change the return code in an IOCTL as well. Not a good idea, cause 
>>> that could have a lot of side effects.
>>>
>>> Instead in the calling DC code you could check if you get an -ENOMEM 
>>> and then call schedule().
>>>
>>> If after the schedule() we see that we have now BOs on the LRU we 
>>> can try again and see if pinning the frame buffer now succeeds.
>>>
>>> Christian.
>>>
>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>
>>>> I made a patch as attached.
>>>>
>>>> I'm not sure how to make patch as your proposal, Could you make a 
>>>> patch for that if mine isn't enough?
>>>>
>>>> -David
>>>>
>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>> how about just adding a wrapper for pin function as below?
>>>>> I considered this as well and don't think it will work reliable.
>>>>>
>>>>> We could use it as a band aid for this specific problem, but in 
>>>>> general we need to improve the handling in TTM to resolve those 
>>>>> kind of resource conflicts.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>> with the ticket.
>>>>>>
>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>> here by DC user? and then DC user grab its lock with ticket, how 
>>>>>> does CS grab it again?
>>>>>>
>>>>>> If you think waiting in ttm has this risk, how about just adding 
>>>>>> a wrapper for pin function as below?
>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>> {
>>>>>> do {
>>>>>> amdgpo_bo_reserve();
>>>>>> r=amdgpu_bo_pin();
>>>>>>
>>>>>> if(!r)
>>>>>>         break;
>>>>>> amdgpu_bo_unreserve();
>>>>>> timeout--;
>>>>>>
>>>>>> } while(timeout>0);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>> From: Christian König
>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>> CC:
>>>>>>
>>>>>> Well that's not so easy of hand.
>>>>>>
>>>>>> The basic problem here is that when you busy wait at this place 
>>>>>> you can easily run into situations where application A busy waits 
>>>>>> for B while B busy waits for A -> deadlock.
>>>>>>
>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>
>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>
>>>>>> 2. If we then run into this EBUSY condition in TTM check if the 
>>>>>> BO we need memory for (or rather the ww_mutex of its reservation 
>>>>>> object) has a ticket assigned.
>>>>>>
>>>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>>>> the ticket.
>>>>>>
>>>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>>>> check if the BO is still the first one on the LRU in question 
>>>>>> (the BO could have moved).
>>>>>>
>>>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>>>> to evict it as we would evict any other BO.
>>>>>>
>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>> -EBUSY.
>>>>>>
>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>>>
>>>>>> Have fun :)
>>>>>> Christian.
>>>>>>
>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>> How about adding more condition ctx->resv inline to address your 
>>>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>>>> deadlock.
>>>>>>>
>>>>>>> Otherwise, any other idea?
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu 
>>>>>>> busy
>>>>>>> From: Christian König
>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>>> CC:
>>>>>>>
>>>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>>>> memory management.
>>>>>>>
>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>> > Acked-by: Prike Liang <Prike.Liang at amd.com>
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Prike
>>>>>>> > -----Original Message-----
>>>>>>> > From: Chunming Zhou <david1.zhou at amd.com>
>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>> > To: dri-devel at lists.freedesktop.org
>>>>>>> > Cc: Liang, Prike <Prike.Liang at amd.com>; Zhou, David(ChunMing) 
>>>>>>> <David1.Zhou at amd.com>
>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>>> >
>>>>>>> > heavy gpu job could occupy memory long time, which could lead 
>>>>>>> to other user fail to get memory.
>>>>>>> >
>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>> > ---
>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>> >
>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
>>>>>>> 100644
>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>>>> ttm_buffer_object *bo,
>>>>>>> >                if (mem->mm_node)
>>>>>>> >                        break;
>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>> place, ctx);
>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>> > -                     return ret;
>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != 
>>>>>>> -EBUSY)
>>>>>>> > +                             return ret;
>>>>>>> > +             }
>>>>>>> >        } while (1);
>>>>>>> >        mem->mem_type = mem_type;
>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>> > --
>>>>>>> > 2.17.1
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > dri-devel mailing list
>>>>>>> > dri-devel at lists.freedesktop.org
>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190424/4b03ddb1/attachment-0001.html>


More information about the dri-devel mailing list