[PATCH] ttm: wait mem space if user allow while gpu busy

Christian König ckoenig.leichtzumerken at gmail.com
Wed Apr 24 10:03:56 UTC 2019


No, I'm busy with P2P and recoverable page faults.

Christian.

Am 24.04.19 um 11:03 schrieb zhoucm1:
>
> OK, Let's drop mine, then Could you draft a solution for that?
>
>
> -David
>
>
> On 2019年04月24日 16:59, Koenig, Christian wrote:
>> Am 24.04.19 um 10:11 schrieb zhoucm1:
>>>
>>>
>>>
>>> On 2019年04月24日 16:07, Christian König wrote:
>>>> This is used in a work item, so you don't need to check for signals.
>>> will remove.
>>>>
>>>> And checking if the LRU is populated is mandatory here
>>> How to check it outside of TTM? because the code is in dm.
>>
>> Well just use a static cast on the first entry of the LRU.
>>
>> We can't upstream that solution anyway, so just a hack should do for now.
>>
>>>
>>>> or otherwise you can run into an endless loop.
>>> I already add a timeout for that.
>>
>> Thinking more about it we most likely actually need to grab the lock 
>> on the first BO entry from the LRU.
>>
>>>
>>> -David
>>>>
>>>> Christian.
>>>>
>>>> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>>>>
>>>>> how about new attached?
>>>>>
>>>>>
>>>>> -David
>>>>>
>>>>>
>>>>> On 2019年04月24日 15:30, Christian König wrote:
>>>>>> That would change the semantics of ttm_bo_mem_space() and so 
>>>>>> could change the return code in an IOCTL as well. Not a good 
>>>>>> idea, cause that could have a lot of side effects.
>>>>>>
>>>>>> Instead in the calling DC code you could check if you get an 
>>>>>> -ENOMEM and then call schedule().
>>>>>>
>>>>>> If after the schedule() we see that we have now BOs on the LRU we 
>>>>>> can try again and see if pinning the frame buffer now succeeds.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>>>>
>>>>>>> I made a patch as attached.
>>>>>>>
>>>>>>> I'm not sure how to make patch as your proposal, Could you make 
>>>>>>> a patch for that if mine isn't enough?
>>>>>>>
>>>>>>> -David
>>>>>>>
>>>>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>>>>> how about just adding a wrapper for pin function as below?
>>>>>>>> I considered this as well and don't think it will work reliable.
>>>>>>>>
>>>>>>>> We could use it as a band aid for this specific problem, but in 
>>>>>>>> general we need to improve the handling in TTM to resolve those 
>>>>>>>> kind of resource conflicts.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation 
>>>>>>>>> lock with the ticket.
>>>>>>>>>
>>>>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>>>>> here by DC user? and then DC user grab its lock with ticket, 
>>>>>>>>> how does CS grab it again?
>>>>>>>>>
>>>>>>>>> If you think waiting in ttm has this risk, how about just 
>>>>>>>>> adding a wrapper for pin function as below?
>>>>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>>>>> {
>>>>>>>>> do {
>>>>>>>>> amdgpo_bo_reserve();
>>>>>>>>> r=amdgpu_bo_pin();
>>>>>>>>>
>>>>>>>>> if(!r)
>>>>>>>>>         break;
>>>>>>>>> amdgpu_bo_unreserve();
>>>>>>>>> timeout--;
>>>>>>>>>
>>>>>>>>> } while(timeout>0);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>> gpu busy
>>>>>>>>> From: Christian König
>>>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
>>>>>>>>> Prike" ,dri-devel at lists.freedesktop.org
>>>>>>>>> CC:
>>>>>>>>>
>>>>>>>>> Well that's not so easy of hand.
>>>>>>>>>
>>>>>>>>> The basic problem here is that when you busy wait at this 
>>>>>>>>> place you can easily run into situations where application A 
>>>>>>>>> busy waits for B while B busy waits for A -> deadlock.
>>>>>>>>>
>>>>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>>>>
>>>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>>>>
>>>>>>>>> 2. If we then run into this EBUSY condition in TTM check if 
>>>>>>>>> the BO we need memory for (or rather the ww_mutex of its 
>>>>>>>>> reservation object) has a ticket assigned.
>>>>>>>>>
>>>>>>>>> 3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation 
>>>>>>>>> lock with the ticket.
>>>>>>>>>
>>>>>>>>> 4. If getting the reservation lock with the ticket succeeded 
>>>>>>>>> we check if the BO is still the first one on the LRU in 
>>>>>>>>> question (the BO could have moved).
>>>>>>>>>
>>>>>>>>> 5. If the BO is still the first one on the LRU in question we 
>>>>>>>>> try to evict it as we would evict any other BO.
>>>>>>>>>
>>>>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>>>>> -EBUSY.
>>>>>>>>>
>>>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I 
>>>>>>>>> can see.
>>>>>>>>>
>>>>>>>>> Have fun :)
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>>>>> How about adding more condition ctx->resv inline to address 
>>>>>>>>>> your concern? As well as don't wait from same user, shouldn't 
>>>>>>>>>> lead to deadlock.
>>>>>>>>>>
>>>>>>>>>> Otherwise, any other idea?
>>>>>>>>>>
>>>>>>>>>> -------- Original Message --------
>>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>>> gpu busy
>>>>>>>>>> From: Christian König
>>>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>>>>>> CC:
>>>>>>>>>>
>>>>>>>>>> Well that is certainly a NAK because it can lead to deadlock 
>>>>>>>>>> in the
>>>>>>>>>> memory management.
>>>>>>>>>>
>>>>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>>>>> > Acked-by: Prike Liang <Prike.Liang at amd.com>
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Prike
>>>>>>>>>> > -----Original Message-----
>>>>>>>>>> > From: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>>>>> > To: dri-devel at lists.freedesktop.org
>>>>>>>>>> > Cc: Liang, Prike <Prike.Liang at amd.com>; Zhou, 
>>>>>>>>>> David(ChunMing) <David1.Zhou at amd.com>
>>>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>>> gpu busy
>>>>>>>>>> >
>>>>>>>>>> > heavy gpu job could occupy memory long time, which could 
>>>>>>>>>> lead to other user fail to get memory.
>>>>>>>>>> >
>>>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>>> > ---
>>>>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>>>> >
>>>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 
>>>>>>>>>> 7c484729f9b2..6c596cc24bec 100644
>>>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> > @@ -830,8 +830,10 @@ static int 
>>>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>>>>>>>>>> >                if (mem->mm_node)
>>>>>>>>>> >                        break;
>>>>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>>>>> place, ctx);
>>>>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>>>>> > -                     return ret;
>>>>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret 
>>>>>>>>>> != -EBUSY)
>>>>>>>>>> > + return ret;
>>>>>>>>>> > +             }
>>>>>>>>>> >        } while (1);
>>>>>>>>>> >        mem->mem_type = mem_type;
>>>>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>>>>> > --
>>>>>>>>>> > 2.17.1
>>>>>>>>>> >
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > dri-devel mailing list
>>>>>>>>>> > dri-devel at lists.freedesktop.org
>>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dri-devel mailing list
>>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>
>>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190424/fbc15509/attachment-0001.html>


More information about the dri-devel mailing list