[PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Christian König christian.koenig at amd.com
Tue Sep 4 07:00:14 UTC 2018


Am 04.09.2018 um 06:04 schrieb zhoucm1:
>
>
> On 2018年09月03日 19:19, Christian König wrote:
>> Am 03.09.2018 um 12:07 schrieb Chunming Zhou:
>>>
>>>
>>> 在 2018/9/3 16:50, Christian König 写道:
>>>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou:
>>>>>
>>>>>
>>>>> 在 2018/8/30 19:32, Christian König 写道:
>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +struct drm_syncobj_wait_pt {
>>>>>>>>>>> +    struct drm_syncobj_stub_fence base;
>>>>>>>>>>> +    u64    value;
>>>>>>>>>>> +    struct rb_node   node;
>>>>>>>>>>> +};
>>>>>>>>>>> +struct drm_syncobj_signal_pt {
>>>>>>>>>>> +    struct drm_syncobj_stub_fence base;
>>>>>>>>>>> +    struct dma_fence *signal_fence;
>>>>>>>>>>> +    struct dma_fence *pre_pt_base;
>>>>>>>>>>> +    struct dma_fence_cb signal_cb;
>>>>>>>>>>> +    struct dma_fence_cb pre_pt_cb;
>>>>>>>>>>> +    struct drm_syncobj *syncobj;
>>>>>>>>>>> +    u64    value;
>>>>>>>>>>> +    struct list_head list;
>>>>>>>>>>> +};
>>>>>>>>>>
>>>>>>>>>> What are those two structures good for
>>>>>>>>> They are used to record wait ops points and signal ops points.
>>>>>>>>> For timeline, they are connected by timeline value, works like:
>>>>>>>>>     a. signal pt increase timeline value to signal_pt value 
>>>>>>>>> when signal_pt is signaled. signal_pt is depending on previous 
>>>>>>>>> pt fence and itself signal fence from signal ops.
>>>>>>>>>     b. wait pt tree checks if timeline value over itself value.
>>>>>>>>>
>>>>>>>>> For normal, works like:
>>>>>>>>>     a. signal pt increase 1 for syncobj_timeline->signal_point 
>>>>>>>>> every time when signal ops is performed.
>>>>>>>>>     b. when wait ops is comming, wait pt will fetch above last 
>>>>>>>>> signal pt value as its wait point. wait pt will be only 
>>>>>>>>> signaled by equal point value signal_pt.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> and why is the stub fence their base?
>>>>>>>>> Good question, I tried to kzalloc for them as well when I 
>>>>>>>>> debug them, I encountered a problem:
>>>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I 
>>>>>>>>> tried to use them, they are freed sometimes, and results in 
>>>>>>>>> NULL point.
>>>>>>>>> and generally, when lookup them, we often need their stub 
>>>>>>>>> fence as well, and in the meantime, their lifecycle are same.
>>>>>>>>> Above reason, I make stub fence as their base.
>>>>>>>>
>>>>>>>> That sounds like you only did this because you messed up the 
>>>>>>>> lifecycle.
>>>>>>>>
>>>>>>>> Additional to that I don't think you correctly considered the 
>>>>>>>> lifecycle of the waits and the sync object itself. E.g. 
>>>>>>>> blocking in drm_syncobj_timeline_fini() until all waits are 
>>>>>>>> done is not a good idea.
>>>>>>>>
>>>>>>>> What you should do instead is to create a fence_array object 
>>>>>>>> with all the fence we need to wait for when a wait point is 
>>>>>>>> requested.
>>>>>>> Yeah, this is our initial discussion result, but when I tried to 
>>>>>>> do that, I found that cannot meet the advance signal requirement:
>>>>>>>     a. We need to consider the wait and signal pt value are not 
>>>>>>> one-to-one match, it's difficult to find dependent point, at 
>>>>>>> least, there is some overhead.
>>>>>>
>>>>>> As far as I can see that is independent of using a fence array 
>>>>>> here. See you can either use a ring buffer or an rb-tree, but 
>>>>>> when you want to wait for a specific point we need to condense 
>>>>>> the not yet signaled fences into an array.
>>>>> again, need to find the range of where the specific point in, we 
>>>>> should close to timeline semantics, I also refered the sw_sync.c 
>>>>> timeline, usally wait_pt is signalled by timeline point. And I 
>>>>> agree we can implement it with several methods, but I don't think 
>>>>> there are basical differences.
>>>>
>>>> The problem is that with your current approach you need the 
>>>> sync_obj alive for the synchronization to work. That is most likely 
>>>> not a good idea.
>>> Indeed, I will fix that. How abount only creating fence array for 
>>> every wait pt when syncobj release? when syncobj release, wait pt 
>>> must have waited the signal opertation, then we can easily condense 
>>> fences for every wait pt. And meantime, we can take timeline based 
>>> wait pt advantage.
>>
>> That could work, but I don't see how you want to replace the already 
>> issued fence with a fence_array when the sync object is destroyed.
>>
>> Additional to that I would rather prefer a consistent handling, e.g. 
>> without extra rarely used code paths.
> Ah, I find a easy way, we just need to make syncobj_timeline structure 
> as a reference. This way syncobj itself can be released first, 
> wait_pt/signal_pt don't need syncobj at all.
> every wait_pt/signal_pt keep a reference of syncobj_timeline.

I've thought about that as well, but came to the conclusion that you run 
into problems because of circle dependencies.

E.g. sync_obj references sync_point and sync_point references sync_obj.

Additional to that it is quite a bit larger memory footprint because you 
need to keep the sync_obj around as well.

>
>>
>>>
>>>>
>>>> Additional to that you enable signaling without a need from the 
>>>> waiting side. That is rather bad for implementations which need 
>>>> that optimization.
>>> Do you mean increasing timeline based on signal fence is not better? 
>>> only update timeline value when requested by a wait pt?
>>
>> Yes, exactly.
>>
>>> This way, we will not update timeline value immidiately and cannot 
>>> free signal pt immidiately, and we also need to consider it to CPU 
>>> query and wait.
>>
>> That is actually the better coding style. We usually try to avoid 
>> doing things in interrupt handlers as much as possible.
> OK, I see your concern, how about to delay handling to a workqueue? 
> this way, we only increase timeline value and wake up workqueue in 
> fence cb, is that acceptable?

A little bit, but not much better. The nice thing with garbage 
collection is that the CPU time is accounted to the process causing the 
garbage.

>>
>> How about this idea:
>> 1. Each signaling point is a fence implementation with an rb node.
>> 2. Each node keeps a reference to the last previously inserted node.
>> 3. Each node is referenced by the sync object itself.
>> 4. Before each signal/wait operation we do a garbage collection and 
>> remove the first node from the tree as long as it is signaled.
>>
>> 5. When enable_signaling is requested for a node we cascade that to 
>> the left using rb_prev.
>>     This ensures that signaling is enabled for the current fence as 
>> well as all previous fences.
>>
>> 6. A wait just looks into the tree for the signal point lower or 
>> equal of the requested sequence number.
>>
>> 7. When the sync object is released we use 
>> rbtree_postorder_for_each_entry_safe() and drop the extra reference 
>> to each node, but never call rb_erase!
>>     This way the rb_tree stays in memory, but without a root (e.g. 
>> the sync object). It only destructs itself when the looked up 
>> references to the nodes are dropped.
>>
>> Well that is quite a bunch of logic, but I think that should work fine.
> Yeah, it could work, simple timeline reference also can solve 'free' 
> problem.

I think this approach is still quite a bit better, e.g. you don't run 
into circle dependency problems, it needs less memory and each node has 
always the same size which means we can use a kmem_cache for it.

Regards,
Christian.

>
> Thanks,
> David Zhou



More information about the amd-gfx mailing list