[PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

Fri Apr 10 01:21:19 UTC 2020

> 2020年4月9日 22:59，Koenig, Christian <Christian.Koenig at amd.com> 写道：
> 
>> Why we break out the loops when there are pending bos to be released?
> 
> We do this anyway if we can't acquire the necessary locks. Freeing already deleted BOs is just a very lazy background work.

That is true. eviction will reclaim the BO resource too.

> 
>> So it did not break anything with this patch I think.
> 
> Oh, the patch will certainly work. I'm just not sure if it's the ideal behavior.
> 
>> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026
>> 
>> This is another example of the usage of  cond_sched.
> 
> Yes, and that is also a good example of what I mean here:
> 
>> 	if (!mutex_trylock(&slab_mutex))
>> 
>> 		
>> /* Give up. Setup the next iteration. */
>> 
>> 		
>> goto out;
> 
> If the function can't acquire the lock immediately it gives up and waits for the next iteration.
> 
> I think it would be better if we do this in TTM as well if we spend to much time cleaning up old BOs.

fair enough.

> 
> On the other hand you are right that cond_resched() has the advantage that we could spend more time on cleaning up old BOs if there is nothing else for the CPU TODO.
> 
> Regards,
> Christian.
> 
> Am 09.04.20 um 16:24 schrieb Pan, Xinhui:
>> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026
>> 
>> This is another example of the usage of  cond_sched.
>> From: Pan, Xinhui <Xinhui.Pan at amd.com>
>> Sent: Thursday, April 9, 2020 10:11:08 PM
>> To: Lucas Stach <l.stach at pengutronix.de>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Koenig, Christian <Christian.Koenig at amd.com>
>> Cc: dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
>>  
>> I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later.
>> So it did not break anything with this patch I think.
>> From: Pan, Xinhui <Xinhui.Pan at amd.com>
>> Sent: Thursday, April 9, 2020 10:07:09 PM
>> To: Lucas Stach <l.stach at pengutronix.de>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Koenig, Christian <Christian.Koenig at amd.com>
>> Cc: dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
>>  
>> Why we break out the loops when there are pending bos to be released?
>> 
>> And I just checked the process_one_work. Right after the work item callback is called,  the workqueue itself will call cond_resched. So I think
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Thursday, April 9, 2020 9:38:24 PM
>> To: Lucas Stach <l.stach at pengutronix.de>; Pan, Xinhui <Xinhui.Pan at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
>> Cc: dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
>>  
>> Am 09.04.20 um 15:25 schrieb Lucas Stach:
>> > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> >> Am 09.04.20 um 03:31 schrieb xinhui pan:
>> >>> The delayed delete list is per device which might be very huge. And in
>> >>> a heavy workload test, the list might always not be empty. That will
>> >>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>> >>> Lets do schedule out if possible in that case.
>> >> Mhm, I'm not sure if that is actually allowed. This is called from a
>> >> work item and those are not really supposed to be scheduled away.
>> > Huh? Workitems can schedule out just fine, otherwise they would be
>> > horribly broken when it comes to sleeping locks.
>> 
>> Let me refine the sentence: Work items are not really supposed to be 
>> scheduled purposely. E.g. you shouldn't call schedule() or 
>> cond_resched() like in the case here.
>> 
>> Getting scheduled away because we wait for a lock is of course perfectly 
>> fine.
>> 
>> >   The workqueue code
>> > even has measures to keep the workqueues at the expected concurrency
>> > level by starting other workitems when one of them goes to sleep.
>> 
>> Yeah, and exactly that's what I would say we should avoid here :)
>> 
>> In other words work items can be scheduled away, but they should not if 
>> not really necessary (e.g. waiting for a lock).
>> 
>> Otherwise as you said new threads for work item processing are started 
>> up and I don't think we want that.
>> 
>> Just returning from the work item and waiting for the next cycle is most 
>> likely the better option.
>> 
>> Regards,
>> Christian.
>> 
>> >
>> > Regards,
>> > Lucas
>> >
>> >> Maybe rather change the while into while (!list_empty(&bdev->ddestroy)
>> >> && !should_reschedule(0)).
>> >>
>> >> Christian.
>> >>
>> >>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>> >>> ---
>> >>>    drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>> >>>    1 file changed, 1 insertion(+)
>> >>>
>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> >>> index 9e07c3f75156..b8d853cab33b 100644
>> >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> >>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)
>> >>>              }
>> >>>    
>> >>>              ttm_bo_put(bo);
>> >>> +           cond_resched();
>> >>>              spin_lock(&glob->lru_lock);
>> >>>      }
>> >>>      list_splice_tail(&removed, &bdev->ddestroy);
>> >> _______________________________________________
>> >> dri-devel mailing list
>> >> dri-devel at lists.freedesktop.org
>> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868&sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3D&reserved=0
>> 
>