[PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Christian König
christian.koenig at amd.com
Tue Sep 4 07:00:14 UTC 2018
Am 04.09.2018 um 06:04 schrieb zhoucm1:
>
>
> On 2018年09月03日 19:19, Christian König wrote:
>> Am 03.09.2018 um 12:07 schrieb Chunming Zhou:
>>>
>>>
>>> 在 2018/9/3 16:50, Christian König 写道:
>>>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou:
>>>>>
>>>>>
>>>>> 在 2018/8/30 19:32, Christian König 写道:
>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +struct drm_syncobj_wait_pt {
>>>>>>>>>>> + struct drm_syncobj_stub_fence base;
>>>>>>>>>>> + u64 value;
>>>>>>>>>>> + struct rb_node node;
>>>>>>>>>>> +};
>>>>>>>>>>> +struct drm_syncobj_signal_pt {
>>>>>>>>>>> + struct drm_syncobj_stub_fence base;
>>>>>>>>>>> + struct dma_fence *signal_fence;
>>>>>>>>>>> + struct dma_fence *pre_pt_base;
>>>>>>>>>>> + struct dma_fence_cb signal_cb;
>>>>>>>>>>> + struct dma_fence_cb pre_pt_cb;
>>>>>>>>>>> + struct drm_syncobj *syncobj;
>>>>>>>>>>> + u64 value;
>>>>>>>>>>> + struct list_head list;
>>>>>>>>>>> +};
>>>>>>>>>>
>>>>>>>>>> What are those two structures good for
>>>>>>>>> They are used to record wait ops points and signal ops points.
>>>>>>>>> For timeline, they are connected by timeline value, works like:
>>>>>>>>> a. signal pt increase timeline value to signal_pt value
>>>>>>>>> when signal_pt is signaled. signal_pt is depending on previous
>>>>>>>>> pt fence and itself signal fence from signal ops.
>>>>>>>>> b. wait pt tree checks if timeline value over itself value.
>>>>>>>>>
>>>>>>>>> For normal, works like:
>>>>>>>>> a. signal pt increase 1 for syncobj_timeline->signal_point
>>>>>>>>> every time when signal ops is performed.
>>>>>>>>> b. when wait ops is comming, wait pt will fetch above last
>>>>>>>>> signal pt value as its wait point. wait pt will be only
>>>>>>>>> signaled by equal point value signal_pt.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> and why is the stub fence their base?
>>>>>>>>> Good question, I tried to kzalloc for them as well when I
>>>>>>>>> debug them, I encountered a problem:
>>>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I
>>>>>>>>> tried to use them, they are freed sometimes, and results in
>>>>>>>>> NULL point.
>>>>>>>>> and generally, when lookup them, we often need their stub
>>>>>>>>> fence as well, and in the meantime, their lifecycle are same.
>>>>>>>>> Above reason, I make stub fence as their base.
>>>>>>>>
>>>>>>>> That sounds like you only did this because you messed up the
>>>>>>>> lifecycle.
>>>>>>>>
>>>>>>>> Additional to that I don't think you correctly considered the
>>>>>>>> lifecycle of the waits and the sync object itself. E.g.
>>>>>>>> blocking in drm_syncobj_timeline_fini() until all waits are
>>>>>>>> done is not a good idea.
>>>>>>>>
>>>>>>>> What you should do instead is to create a fence_array object
>>>>>>>> with all the fence we need to wait for when a wait point is
>>>>>>>> requested.
>>>>>>> Yeah, this is our initial discussion result, but when I tried to
>>>>>>> do that, I found that cannot meet the advance signal requirement:
>>>>>>> a. We need to consider the wait and signal pt value are not
>>>>>>> one-to-one match, it's difficult to find dependent point, at
>>>>>>> least, there is some overhead.
>>>>>>
>>>>>> As far as I can see that is independent of using a fence array
>>>>>> here. See you can either use a ring buffer or an rb-tree, but
>>>>>> when you want to wait for a specific point we need to condense
>>>>>> the not yet signaled fences into an array.
>>>>> again, need to find the range of where the specific point in, we
>>>>> should close to timeline semantics, I also refered the sw_sync.c
>>>>> timeline, usally wait_pt is signalled by timeline point. And I
>>>>> agree we can implement it with several methods, but I don't think
>>>>> there are basical differences.
>>>>
>>>> The problem is that with your current approach you need the
>>>> sync_obj alive for the synchronization to work. That is most likely
>>>> not a good idea.
>>> Indeed, I will fix that. How abount only creating fence array for
>>> every wait pt when syncobj release? when syncobj release, wait pt
>>> must have waited the signal opertation, then we can easily condense
>>> fences for every wait pt. And meantime, we can take timeline based
>>> wait pt advantage.
>>
>> That could work, but I don't see how you want to replace the already
>> issued fence with a fence_array when the sync object is destroyed.
>>
>> Additional to that I would rather prefer a consistent handling, e.g.
>> without extra rarely used code paths.
> Ah, I find a easy way, we just need to make syncobj_timeline structure
> as a reference. This way syncobj itself can be released first,
> wait_pt/signal_pt don't need syncobj at all.
> every wait_pt/signal_pt keep a reference of syncobj_timeline.
I've thought about that as well, but came to the conclusion that you run
into problems because of circle dependencies.
E.g. sync_obj references sync_point and sync_point references sync_obj.
Additional to that it is quite a bit larger memory footprint because you
need to keep the sync_obj around as well.
>
>>
>>>
>>>>
>>>> Additional to that you enable signaling without a need from the
>>>> waiting side. That is rather bad for implementations which need
>>>> that optimization.
>>> Do you mean increasing timeline based on signal fence is not better?
>>> only update timeline value when requested by a wait pt?
>>
>> Yes, exactly.
>>
>>> This way, we will not update timeline value immidiately and cannot
>>> free signal pt immidiately, and we also need to consider it to CPU
>>> query and wait.
>>
>> That is actually the better coding style. We usually try to avoid
>> doing things in interrupt handlers as much as possible.
> OK, I see your concern, how about to delay handling to a workqueue?
> this way, we only increase timeline value and wake up workqueue in
> fence cb, is that acceptable?
A little bit, but not much better. The nice thing with garbage
collection is that the CPU time is accounted to the process causing the
garbage.
>>
>> How about this idea:
>> 1. Each signaling point is a fence implementation with an rb node.
>> 2. Each node keeps a reference to the last previously inserted node.
>> 3. Each node is referenced by the sync object itself.
>> 4. Before each signal/wait operation we do a garbage collection and
>> remove the first node from the tree as long as it is signaled.
>>
>> 5. When enable_signaling is requested for a node we cascade that to
>> the left using rb_prev.
>> This ensures that signaling is enabled for the current fence as
>> well as all previous fences.
>>
>> 6. A wait just looks into the tree for the signal point lower or
>> equal of the requested sequence number.
>>
>> 7. When the sync object is released we use
>> rbtree_postorder_for_each_entry_safe() and drop the extra reference
>> to each node, but never call rb_erase!
>> This way the rb_tree stays in memory, but without a root (e.g.
>> the sync object). It only destructs itself when the looked up
>> references to the nodes are dropped.
>>
>> Well that is quite a bunch of logic, but I think that should work fine.
> Yeah, it could work, simple timeline reference also can solve 'free'
> problem.
I think this approach is still quite a bit better, e.g. you don't run
into circle dependency problems, it needs less memory and each node has
always the same size which means we can use a kmem_cache for it.
Regards,
Christian.
>
> Thanks,
> David Zhou
More information about the amd-gfx
mailing list