[PATCH 2/2] drm: Revert syncobj timeline changes.
zhoucm1
zhoucm1 at amd.com
Tue Nov 13 06:18:37 UTC 2018
On 2018年11月12日 18:16, Christian König wrote:
> Am 09.11.18 um 23:26 schrieb Eric Anholt:
>> Eric Anholt<eric at anholt.net> writes:
>>
>>> [ Unknown signature status ]
>>> zhoucm1<zhoucm1 at amd.com> writes:
>>>
>>>> On 2018年11月09日 00:52, Christian König wrote:
>>>>> Am 08.11.18 um 17:07 schrieb Koenig, Christian:
>>>>>> Am 08.11.18 um 17:04 schrieb Eric Anholt:
>>>>>>> Daniel suggested I submit this, since we're still seeing regressions
>>>>>>> from it. This is a revert to before 48197bc564c7 ("drm: add syncobj
>>>>>>> timeline support v9") and its followon fixes.
>>>>>> This is a harmless false positive from lockdep, Chouming and I are
>>>>>> already working on a fix.
>>>>> On the other hand we had enough trouble with that patch, so if it
>>>>> really bothers you feel free to add my Acked-by: Christian König
>>>>> <christian.koenig at amd.com> and push it.
>>>> NAK, please no, I don't think this needed, the Warning totally isn't
>>>> related to syncobj timeline, but fence-array implementation flaw, just
>>>> exposed by syncobj.
>>>> In addition, Christian already has a fix for this Warning, I've tested.
>>>> Please Christian send to public review.
>>> I backed out my revert of #2 (#1 still necessary) after adding the
>>> lockdep regression fix, and now my CTS run got oomkilled after just a
>>> few hours, with these notable lines in the unreclaimable slab info list:
>>>
>>> [ 6314.373099] drm_sched_fence 69095KB 69095KB
>>> [ 6314.373653] kmemleak_object 428249KB 428384KB
>>> [ 6314.373736] kmalloc-262144 256KB 256KB
>>> [ 6314.373743] kmalloc-131072 128KB 128KB
>>> [ 6314.373750] kmalloc-65536 64KB 64KB
>>> [ 6314.373756] kmalloc-32768 1472KB 1728KB
>>> [ 6314.373763] kmalloc-16384 64KB 64KB
>>> [ 6314.373770] kmalloc-8192 208KB 208KB
>>> [ 6314.373778] kmalloc-4096 2408KB 2408KB
>>> [ 6314.373784] kmalloc-2048 288KB 336KB
>>> [ 6314.373792] kmalloc-1024 1457KB 1512KB
>>> [ 6314.373800] kmalloc-512 854KB 1048KB
>>> [ 6314.373808] kmalloc-256 188KB 268KB
>>> [ 6314.373817] kmalloc-192 69141KB 69142KB
>>> [ 6314.373824] kmalloc-64 47703KB 47704KB
>>> [ 6314.373886] kmalloc-128 46396KB 46396KB
>>> [ 6314.373894] kmem_cache 31KB 35KB
>>>
>>> No results from kmemleak, though.
>> OK, it looks like the #2 revert probably isn't related to the OOM issue.
>> Running a single job on otherwise unused DRM, watching /proc/slabinfo
>> every second for drm_sched_fence, I get:
>>
>> drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0
>> drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>>
>> So we generate a ton of fences, and I guess free them slowly because of
>> RCU? And presumably kmemleak was sucking up lots of memory because of
>> how many of these objects were laying around.
Hi Eric,
Thanks for testing, I checked the code, we could forget signal
fence-array immediately after many reviews without callback for
fence-array, everything is waiting fence_wait or syncobj free, that's
why you see "free them slowly".
Maybe we just need one line change as attahced, Could you please have a
try with it on your tests?
btw, I also didn't find there is fence-leak, or you can point me.
Thanks,
David
>
> That is certainly possible. Another possibility is that we don't drop
> the reference in dma-fence-array early enough.
>
> E.g. the dma-fence-array will keep the reference to its fences until
> it is destroyed, which is a bit late when you chain multiple
> dma-fence-array objects together.
>
> David can you take a look at this and propose a fix? That would
> probably be good to have fixed in dma-fence-array separately to the
> timeline work.
>
> Thanks,
> Christian.
>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181113/5dff8b33/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-syncobj-enable-signaling-for-fence_array-of-sign.patch
Type: text/x-patch
Size: 890 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181113/5dff8b33/attachment-0001.bin>
More information about the dri-devel
mailing list