[PATCH] ttm: wait mem space if user allow while gpu busy
zhoucm1
zhoucm1 at amd.com
Wed Apr 24 08:11:40 UTC 2019
On 2019年04月24日 16:07, Christian König wrote:
> This is used in a work item, so you don't need to check for signals.
will remove.
>
> And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.
> or otherwise you can run into an endless loop.
I already add a timeout for that.
-David
>
> Christian.
>
> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>
>> how about new attached?
>>
>>
>> -David
>>
>>
>> On 2019年04月24日 15:30, Christian König wrote:
>>> That would change the semantics of ttm_bo_mem_space() and so could
>>> change the return code in an IOCTL as well. Not a good idea, cause
>>> that could have a lot of side effects.
>>>
>>> Instead in the calling DC code you could check if you get an -ENOMEM
>>> and then call schedule().
>>>
>>> If after the schedule() we see that we have now BOs on the LRU we
>>> can try again and see if pinning the frame buffer now succeeds.
>>>
>>> Christian.
>>>
>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>
>>>> I made a patch as attached.
>>>>
>>>> I'm not sure how to make patch as your proposal, Could you make a
>>>> patch for that if mine isn't enough?
>>>>
>>>> -David
>>>>
>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>> how about just adding a wrapper for pin function as below?
>>>>> I considered this as well and don't think it will work reliable.
>>>>>
>>>>> We could use it as a band aid for this specific problem, but in
>>>>> general we need to improve the handling in TTM to resolve those
>>>>> kind of resource conflicts.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>> >3. If we have a ticket we grab a reference to the first BO on
>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock
>>>>>> with the ticket.
>>>>>>
>>>>>> The BO on LRU is already locked by cs user, can it be dropped
>>>>>> here by DC user? and then DC user grab its lock with ticket, how
>>>>>> does CS grab it again?
>>>>>>
>>>>>> If you think waiting in ttm has this risk, how about just adding
>>>>>> a wrapper for pin function as below?
>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>> {
>>>>>> do {
>>>>>> amdgpo_bo_reserve();
>>>>>> r=amdgpu_bo_pin();
>>>>>>
>>>>>> if(!r)
>>>>>> break;
>>>>>> amdgpu_bo_unreserve();
>>>>>> timeout--;
>>>>>>
>>>>>> } while(timeout>0);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>> From: Christian König
>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike"
>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>> CC:
>>>>>>
>>>>>> Well that's not so easy of hand.
>>>>>>
>>>>>> The basic problem here is that when you busy wait at this place
>>>>>> you can easily run into situations where application A busy waits
>>>>>> for B while B busy waits for A -> deadlock.
>>>>>>
>>>>>> So what we need here is the deadlock detection logic of the
>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>
>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>
>>>>>> 2. If we then run into this EBUSY condition in TTM check if the
>>>>>> BO we need memory for (or rather the ww_mutex of its reservation
>>>>>> object) has a ticket assigned.
>>>>>>
>>>>>> 3. If we have a ticket we grab a reference to the first BO on the
>>>>>> LRU, drop the LRU lock and try to grab the reservation lock with
>>>>>> the ticket.
>>>>>>
>>>>>> 4. If getting the reservation lock with the ticket succeeded we
>>>>>> check if the BO is still the first one on the LRU in question
>>>>>> (the BO could have moved).
>>>>>>
>>>>>> 5. If the BO is still the first one on the LRU in question we try
>>>>>> to evict it as we would evict any other BO.
>>>>>>
>>>>>> 6. If any of the "If's" above fail we just back off and return
>>>>>> -EBUSY.
>>>>>>
>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>>>
>>>>>> Have fun :)
>>>>>> Christian.
>>>>>>
>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>> How about adding more condition ctx->resv inline to address your
>>>>>>> concern? As well as don't wait from same user, shouldn't lead to
>>>>>>> deadlock.
>>>>>>>
>>>>>>> Otherwise, any other idea?
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu
>>>>>>> busy
>>>>>>> From: Christian König
>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)"
>>>>>>> ,dri-devel at lists.freedesktop.org
>>>>>>> CC:
>>>>>>>
>>>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>>>> memory management.
>>>>>>>
>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>> > Acked-by: Prike Liang <Prike.Liang at amd.com>
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Prike
>>>>>>> > -----Original Message-----
>>>>>>> > From: Chunming Zhou <david1.zhou at amd.com>
>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>> > To: dri-devel at lists.freedesktop.org
>>>>>>> > Cc: Liang, Prike <Prike.Liang at amd.com>; Zhou, David(ChunMing)
>>>>>>> <David1.Zhou at amd.com>
>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>>> >
>>>>>>> > heavy gpu job could occupy memory long time, which could lead
>>>>>>> to other user fail to get memory.
>>>>>>> >
>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>> > ---
>>>>>>> > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>> > 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>> >
>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec
>>>>>>> 100644
>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct
>>>>>>> ttm_buffer_object *bo,
>>>>>>> > if (mem->mm_node)
>>>>>>> > break;
>>>>>>> > ret = ttm_mem_evict_first(bdev, mem_type,
>>>>>>> place, ctx);
>>>>>>> > - if (unlikely(ret != 0))
>>>>>>> > - return ret;
>>>>>>> > + if (unlikely(ret != 0)) {
>>>>>>> > + if (!ctx || ctx->no_wait_gpu || ret !=
>>>>>>> -EBUSY)
>>>>>>> > + return ret;
>>>>>>> > + }
>>>>>>> > } while (1);
>>>>>>> > mem->mem_type = mem_type;
>>>>>>> > return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>> > --
>>>>>>> > 2.17.1
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > dri-devel mailing list
>>>>>>> > dri-devel at lists.freedesktop.org
>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190424/4b03ddb1/attachment-0001.html>
More information about the dri-devel
mailing list