[PATCH] ttm: wait mem space if user allow while gpu busy

zhoucm1 zhoucm1 at amd.com
Wed Apr 24 09:03:59 UTC 2019


OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:
> Am 24.04.19 um 10:11 schrieb zhoucm1:
>>
>>
>>
>> On 2019年04月24日 16:07, Christian König wrote:
>>> This is used in a work item, so you don't need to check for signals.
>> will remove.
>>>
>>> And checking if the LRU is populated is mandatory here
>> How to check it outside of TTM? because the code is in dm.
>
> Well just use a static cast on the first entry of the LRU.
>
> We can't upstream that solution anyway, so just a hack should do for now.
>
>>
>>> or otherwise you can run into an endless loop.
>> I already add a timeout for that.
>
> Thinking more about it we most likely actually need to grab the lock 
> on the first BO entry from the LRU.
>
>>
>> -David
>>>
>>> Christian.
>>>
>>> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>>>
>>>> how about new attached?
>>>>
>>>>
>>>> -David
>>>>
>>>>
>>>> On 2019年04月24日 15:30, Christian König wrote:
>>>>> That would change the semantics of ttm_bo_mem_space() and so could 
>>>>> change the return code in an IOCTL as well. Not a good idea, cause 
>>>>> that could have a lot of side effects.
>>>>>
>>>>> Instead in the calling DC code you could check if you get an 
>>>>> -ENOMEM and then call schedule().
>>>>>
>>>>> If after the schedule() we see that we have now BOs on the LRU we 
>>>>> can try again and see if pinning the frame buffer now succeeds.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>>>
>>>>>> I made a patch as attached.
>>>>>>
>>>>>> I'm not sure how to make patch as your proposal, Could you make a 
>>>>>> patch for that if mine isn't enough?
>>>>>>
>>>>>> -David
>>>>>>
>>>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>>>> how about just adding a wrapper for pin function as below?
>>>>>>> I considered this as well and don't think it will work reliable.
>>>>>>>
>>>>>>> We could use it as a band aid for this specific problem, but in 
>>>>>>> general we need to improve the handling in TTM to resolve those 
>>>>>>> kind of resource conflicts.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>>>> with the ticket.
>>>>>>>>
>>>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>>>> here by DC user? and then DC user grab its lock with ticket, 
>>>>>>>> how does CS grab it again?
>>>>>>>>
>>>>>>>> If you think waiting in ttm has this risk, how about just 
>>>>>>>> adding a wrapper for pin function as below?
>>>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>>>> {
>>>>>>>> do {
>>>>>>>> amdgpo_bo_reserve();
>>>>>>>> r=amdgpu_bo_pin();
>>>>>>>>
>>>>>>>> if(!r)
>>>>>>>>         break;
>>>>>>>> amdgpu_bo_unreserve();
>>>>>>>> timeout--;
>>>>>>>>
>>>>>>>> } while(timeout>0);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>> gpu busy
>>>>>>>> From: Christian König
>>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
>>>>>>>> Prike" ,dri-devel at lists.freedesktop.org
>>>>>>>> CC:
>>>>>>>>
>>>>>>>> Well that's not so easy of hand.
>>>>>>>>
>>>>>>>> The basic problem here is that when you busy wait at this place 
>>>>>>>> you can easily run into situations where application A busy 
>>>>>>>> waits for B while B busy waits for A -> deadlock.
>>>>>>>>
>>>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>>>
>>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>>>
>>>>>>>> 2. If we then run into this EBUSY condition in TTM check if the 
>>>>>>>> BO we need memory for (or rather the ww_mutex of its 
>>>>>>>> reservation object) has a ticket assigned.
>>>>>>>>
>>>>>>>> 3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>>>> with the ticket.
>>>>>>>>
>>>>>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>>>>>> check if the BO is still the first one on the LRU in question 
>>>>>>>> (the BO could have moved).
>>>>>>>>
>>>>>>>> 5. If the BO is still the first one on the LRU in question we 
>>>>>>>> try to evict it as we would evict any other BO.
>>>>>>>>
>>>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>>>> -EBUSY.
>>>>>>>>
>>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can 
>>>>>>>> see.
>>>>>>>>
>>>>>>>> Have fun :)
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>>>> How about adding more condition ctx->resv inline to address 
>>>>>>>>> your concern? As well as don't wait from same user, shouldn't 
>>>>>>>>> lead to deadlock.
>>>>>>>>>
>>>>>>>>> Otherwise, any other idea?
>>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>> gpu busy
>>>>>>>>> From: Christian König
>>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>>>>> CC:
>>>>>>>>>
>>>>>>>>> Well that is certainly a NAK because it can lead to deadlock 
>>>>>>>>> in the
>>>>>>>>> memory management.
>>>>>>>>>
>>>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>>>> > Acked-by: Prike Liang <Prike.Liang at amd.com>
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Prike
>>>>>>>>> > -----Original Message-----
>>>>>>>>> > From: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>>>> > To: dri-devel at lists.freedesktop.org
>>>>>>>>> > Cc: Liang, Prike <Prike.Liang at amd.com>; Zhou, 
>>>>>>>>> David(ChunMing) <David1.Zhou at amd.com>
>>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu 
>>>>>>>>> busy
>>>>>>>>> >
>>>>>>>>> > heavy gpu job could occupy memory long time, which could 
>>>>>>>>> lead to other user fail to get memory.
>>>>>>>>> >
>>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>> > ---
>>>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>>> >
>>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 
>>>>>>>>> 7c484729f9b2..6c596cc24bec 100644
>>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> > @@ -830,8 +830,10 @@ static int 
>>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>>>>>>>>> >                if (mem->mm_node)
>>>>>>>>> >                        break;
>>>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>>>> place, ctx);
>>>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>>>> > -                     return ret;
>>>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != 
>>>>>>>>> -EBUSY)
>>>>>>>>> > +                             return ret;
>>>>>>>>> > +             }
>>>>>>>>> >        } while (1);
>>>>>>>>> >        mem->mem_type = mem_type;
>>>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>>>> > --
>>>>>>>>> > 2.17.1
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > dri-devel mailing list
>>>>>>>>> > dri-devel at lists.freedesktop.org
>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190424/d25a8c93/attachment-0001.html>


More information about the dri-devel mailing list