[PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Sep 3 11:19:16 UTC 2018
Am 03.09.2018 um 12:07 schrieb Chunming Zhou:
>
>
> 在 2018/9/3 16:50, Christian König 写道:
>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou:
>>>
>>>
>>> 在 2018/8/30 19:32, Christian König 写道:
>>>> [SNIP]
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +struct drm_syncobj_wait_pt {
>>>>>>>>> + struct drm_syncobj_stub_fence base;
>>>>>>>>> + u64 value;
>>>>>>>>> + struct rb_node node;
>>>>>>>>> +};
>>>>>>>>> +struct drm_syncobj_signal_pt {
>>>>>>>>> + struct drm_syncobj_stub_fence base;
>>>>>>>>> + struct dma_fence *signal_fence;
>>>>>>>>> + struct dma_fence *pre_pt_base;
>>>>>>>>> + struct dma_fence_cb signal_cb;
>>>>>>>>> + struct dma_fence_cb pre_pt_cb;
>>>>>>>>> + struct drm_syncobj *syncobj;
>>>>>>>>> + u64 value;
>>>>>>>>> + struct list_head list;
>>>>>>>>> +};
>>>>>>>>
>>>>>>>> What are those two structures good for
>>>>>>> They are used to record wait ops points and signal ops points.
>>>>>>> For timeline, they are connected by timeline value, works like:
>>>>>>> a. signal pt increase timeline value to signal_pt value when
>>>>>>> signal_pt is signaled. signal_pt is depending on previous pt
>>>>>>> fence and itself signal fence from signal ops.
>>>>>>> b. wait pt tree checks if timeline value over itself value.
>>>>>>>
>>>>>>> For normal, works like:
>>>>>>> a. signal pt increase 1 for syncobj_timeline->signal_point
>>>>>>> every time when signal ops is performed.
>>>>>>> b. when wait ops is comming, wait pt will fetch above last
>>>>>>> signal pt value as its wait point. wait pt will be only signaled
>>>>>>> by equal point value signal_pt.
>>>>>>>
>>>>>>>
>>>>>>>> and why is the stub fence their base?
>>>>>>> Good question, I tried to kzalloc for them as well when I debug
>>>>>>> them, I encountered a problem:
>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I
>>>>>>> tried to use them, they are freed sometimes, and results in NULL
>>>>>>> point.
>>>>>>> and generally, when lookup them, we often need their stub fence
>>>>>>> as well, and in the meantime, their lifecycle are same.
>>>>>>> Above reason, I make stub fence as their base.
>>>>>>
>>>>>> That sounds like you only did this because you messed up the
>>>>>> lifecycle.
>>>>>>
>>>>>> Additional to that I don't think you correctly considered the
>>>>>> lifecycle of the waits and the sync object itself. E.g. blocking
>>>>>> in drm_syncobj_timeline_fini() until all waits are done is not a
>>>>>> good idea.
>>>>>>
>>>>>> What you should do instead is to create a fence_array object with
>>>>>> all the fence we need to wait for when a wait point is requested.
>>>>> Yeah, this is our initial discussion result, but when I tried to
>>>>> do that, I found that cannot meet the advance signal requirement:
>>>>> a. We need to consider the wait and signal pt value are not
>>>>> one-to-one match, it's difficult to find dependent point, at
>>>>> least, there is some overhead.
>>>>
>>>> As far as I can see that is independent of using a fence array
>>>> here. See you can either use a ring buffer or an rb-tree, but when
>>>> you want to wait for a specific point we need to condense the not
>>>> yet signaled fences into an array.
>>> again, need to find the range of where the specific point in, we
>>> should close to timeline semantics, I also refered the sw_sync.c
>>> timeline, usally wait_pt is signalled by timeline point. And I agree
>>> we can implement it with several methods, but I don't think there
>>> are basical differences.
>>
>> The problem is that with your current approach you need the sync_obj
>> alive for the synchronization to work. That is most likely not a good
>> idea.
> Indeed, I will fix that. How abount only creating fence array for
> every wait pt when syncobj release? when syncobj release, wait pt must
> have waited the signal opertation, then we can easily condense fences
> for every wait pt. And meantime, we can take timeline based wait pt
> advantage.
That could work, but I don't see how you want to replace the already
issued fence with a fence_array when the sync object is destroyed.
Additional to that I would rather prefer a consistent handling, e.g.
without extra rarely used code paths.
>
>>
>> Additional to that you enable signaling without a need from the
>> waiting side. That is rather bad for implementations which need that
>> optimization.
> Do you mean increasing timeline based on signal fence is not better?
> only update timeline value when requested by a wait pt?
Yes, exactly.
> This way, we will not update timeline value immidiately and cannot
> free signal pt immidiately, and we also need to consider it to CPU
> query and wait.
That is actually the better coding style. We usually try to avoid doing
things in interrupt handlers as much as possible.
How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection and
remove the first node from the tree as long as it is signaled.
5. When enable_signaling is requested for a node we cascade that to the
left using rb_prev.
This ensures that signaling is enabled for the current fence as
well as all previous fences.
6. A wait just looks into the tree for the signal point lower or equal
of the requested sequence number.
7. When the sync object is released we use
rbtree_postorder_for_each_entry_safe() and drop the extra reference to
each node, but never call rb_erase!
This way the rb_tree stays in memory, but without a root (e.g. the
sync object). It only destructs itself when the looked up references to
the nodes are dropped.
Well that is quite a bunch of logic, but I think that should work fine.
Regards,
Christian.
>
> Thanks,
> David Zhou
>
>>
>> I suggest to either condense all the fences you need to wait for in
>> an array during the wait operation, or reference count the sync_obj
>> and only enable the signaling on the fences when requested by a wait.
>>
>>>
>>>>
>>>>> b. because we allowed "wait-before-signal", if
>>>>> "wait-before-signal" happens, there isn't signal fence which can
>>>>> be used to create fence array.
>>>>
>>>> Well, again we DON'T support wait-before-signal here. I will
>>>> certainly NAK any implementation which tries to do this until we
>>>> haven't figured out all the resource management constraints and I
>>>> still don't see how we can do this.
>>> yes, we don't support real wait-before-signal in patch now, just a
>>> fake wait-before-signal, which still wait on CS submission until
>>> signal operation coming through wait_event, which is the conclusion
>>> we disscussed before.
>>
>> Well in this case we should call it wait-for-signal and not
>> wait-before-signal :)
>>
>>>
>>>>
>>>>> So timeline value is good to resolve that.
>>>>>
>>>>>>
>>>>>> Otherwise somebody can easily construct a situation where
>>>>>> timeline sync obj A waits on B and B waits on A.
>>>>> Same situation also can happens with fence-array, we only can see
>>>>> this is a programming bug with incorrect use.
>>>>
>>>> No, fence-array is initialized only once with a static list of
>>>> fences. This way it is impossible to add the fence-array to itself
>>>> for example.
>>>>
>>>> E.g. you can't build circle dependencies with that.
>>> we already wait for signal operation event, so never build circle
>>> dependencies with that. The theory is same.
>>
>> Yeah, ok that is certainly true.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>>>
>>>>> Better use rbtree_postorder_for_each_entry_safe() for this.
>>>>>>> From the comments, seems we cannot use it here, we need erase
>>>>>>> node here.
>>>>>>> Comments:
>>>>>>> * Note, however, that it cannot handle other modifications that
>>>>>>> re-order the
>>>>>>> * rbtree it is iterating over. This includes calling rb_erase()
>>>>>>> on @pos, as
>>>>>>> * rb_erase() may rebalance the tree, causing us to miss some
>>>>>>> nodes.
>>>>>>> */
>>>>>>
>>>>>> Yeah, but your current implementation has the same problem :)
>>>>>>
>>>>>> You need to use something like "while (node = rb_first(...))"
>>>>>> instead if you want to destruct the whole tree.
>>> OK, got it, I will change it in v4.
>>>
>>> Thanks,
>>> David Zhou
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the dri-devel
mailing list