<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 9, 2017 at 11:25 AM, Christian König <span dir="ltr"><<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Am 09.08.2017 um 19:57 schrieb Chris Wilson:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Quoting Jason Ekstrand (2017-08-09 18:00:54)<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Vulkan VkFence semantics require that the application be able to perform<br>
a CPU wait on work which may not yet have been submitted. This is<br>
perfectly safe because the CPU wait has a timeout which will get<br>
triggered eventually if no work is ever submitted. This behavior is<br>
advantageous for multi-threaded workloads because, so long as all of the<br>
threads agree on what fences to use up-front, you don't have the extra<br>
cross-thread synchronization cost of thread A telling thread B that it<br>
has submitted its dependent work and thread B is now free to wait.<br>
<br>
Within a single process, this can be implemented in the userspace driver<br>
by doing exactly the same kind of tracking the app would have to do<br>
using posix condition variables or similar. However, in order for this<br>
to work cross-process (as is required by VK_KHR_external_fence), we need<br>
to handle this in the kernel.<br>
<br>
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which<br>
instructs the IOCTL to wait for the syncobj to have a non-null fence and<br>
then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can<br>
easily get the Vulkan behavior.<br>
<br>
v2:<br>
- Fix a bug in the invalid syncobj error path<br>
- Unify the wait-all and wait-any cases<br>
<br>
Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br>
Cc: Dave Airlie <<a href="mailto:airlied@redhat.com" target="_blank">airlied@redhat.com</a>><br>
---<br>
<br>
I realized today (this is what happens when you sleep) that it takes almost<br>
no work to make the wait-any path also handle wait-all. By unifying the<br>
two, I deleted over a hundred lines of code from the implementation.<br>
<br>
drivers/gpu/drm/drm_syncobj.c | 243 ++++++++++++++++++++++++++++++<wbr>++++--------<br>
include/uapi/drm/drm.h | 1 +<br>
2 files changed, 199 insertions(+), 45 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/drm_syncobj.<wbr>c b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
index 510dfc2..5e7f654 100644<br>
--- a/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
+++ b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
@@ -51,6 +51,7 @@<br>
#include <linux/fs.h><br>
#include <linux/anon_inodes.h><br>
#include <linux/sync_file.h><br>
+#include <linux/sched/signal.h><br>
#include "drm_internal.h"<br>
#include <drm/drm_syncobj.h><br>
@@ -104,6 +105,27 @@ static void drm_syncobj_add_callback_locke<wbr>d(struct drm_syncobj *syncobj,<br>
</blockquote>
This is a bit of the puzzle that is missing.<br></blockquote></div></div></blockquote><div><br></div><div>? It's in the previous patch.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
list_add_tail(&cb->node, &syncobj->cb_list);<br>
}<br>
+static int drm_syncobj_fence_get_or_add_c<wbr>allback(struct drm_syncobj *syncobj,<br>
+ struct dma_fence **fence,<br>
+ struct drm_syncobj_cb *cb,<br>
+ drm_syncobj_func_t func)<br>
+{<br>
+ int ret;<br>
+<br>
+ spin_lock(&syncobj->lock);<br>
+ if (syncobj->fence) {<br>
+ *fence = dma_fence_get(syncobj->fence);<br>
+ ret = 1;<br>
+ } else {<br>
+ *fence = NULL;<br>
+ drm_syncobj_add_callback_lock<wbr>ed(syncobj, cb, func);<br>
+ ret = 0;<br>
+ }<br>
+ spin_unlock(&syncobj->lock);<br>
+<br>
+ return ret;<br>
+}<br>
+<br>
/**<br>
* drm_syncobj_add_callback - adds a callback to syncobj::cb_list<br>
* @syncobj: Sync object to which to add the callback<br>
@@ -526,6 +548,159 @@ drm_syncobj_fd_to_handle_ioctl<wbr>(struct drm_device *dev, void *data,<br>
&args->handle);<br>
}<br>
+static int drm_syncobj_signaled(struct drm_syncobj *syncobj,<br>
+ uint32_t wait_flags)<br>
+{<br>
+ struct dma_fence *fence;<br>
+ int ret;<br>
+<br>
+ fence = drm_syncobj_fence_get(syncobj)<wbr>;<br>
+ if (!fence) {<br>
+ if (wait_flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FO<wbr>R_SUBMIT)<br>
+ return 0;<br>
+ else<br>
+ return -EINVAL;<br>
+ }<br>
+<br>
+ ret = dma_fence_is_signaled(fence) ? 1 : 0;<br>
+<br>
+ dma_fence_put(fence);<br>
+<br>
+ return ret;<br>
+}<br>
+<br>
+struct syncobj_wait_entry {<br>
+ struct task_struct *task;<br>
+ struct dma_fence *fence;<br>
+ struct dma_fence_cb fence_cb;<br>
+ struct drm_syncobj_cb syncobj_cb;<br>
+};<br>
+<br>
+static void syncobj_wait_fence_func(struct dma_fence *fence,<br>
+ struct dma_fence_cb *cb)<br>
+{<br>
+ struct syncobj_wait_entry *wait =<br>
+ container_of(cb, struct syncobj_wait_entry, fence_cb);<br>
+<br>
+ wake_up_process(wait->task);<br>
+}<br>
+<br>
+static void syncobj_wait_syncobj_func(stru<wbr>ct drm_syncobj *syncobj,<br>
+ struct drm_syncobj_cb *cb)<br>
+{<br>
+ struct syncobj_wait_entry *wait =<br>
+ container_of(cb, struct syncobj_wait_entry, syncobj_cb);<br>
+<br>
+ /* This happens inside the syncobj lock */<br>
+ wait->fence = dma_fence_get(syncobj->fence);<br>
+ wake_up_process(wait->task);<br>
+}<br>
+<br>
+static signed long drm_syncobj_array_wait_timeout<wbr>(struct drm_syncobj **syncobjs,<br>
+ uint32_t count,<br>
+ uint32_t flags,<br>
+ signed long timeout,<br>
+ uint32_t *idx)<br>
+{<br>
+ struct syncobj_wait_entry *entries;<br>
+ struct dma_fence *fence;<br>
+ signed long ret;<br>
+ uint32_t signaled_count, i;<br>
+<br>
+ if (timeout == 0) {<br>
+ signaled_count = 0;<br>
+ for (i = 0; i < count; ++i) {<br>
+ ret = drm_syncobj_signaled(syncobjs[<wbr>i], flags);<br>
+ if (ret < 0)<br>
+ return ret;<br>
+ if (ret == 0)<br>
+ continue;<br>
+ if (signaled_count == 0 && idx)<br>
+ *idx = i;<br>
+ signaled_count++;<br>
+ }<br>
+<br>
+ if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L)<br>
+ return signaled_count == count ? 1 : 0;<br>
+ else<br>
+ return signaled_count > 0 ? 1 : 0;<br>
+ }<br>
+<br>
+ entries = kcalloc(count, sizeof(*entries), GFP_KERNEL);<br>
+ if (!entries)<br>
+ return -ENOMEM;<br>
+<br>
+ for (i = 0; i < count; ++i) {<br>
+ entries[i].task = current;<br>
+<br>
+ if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FO<wbr>R_SUBMIT) {<br>
+ drm_syncobj_fence_get_or_add_<wbr>callback(syncobjs[i],<br>
+ &entries[i].fence,<br>
+ &entries[i].syncobj_cb,<br>
+ syncobj_wait_syncobj_func);<br>
+ } else {<br>
+ entries[i].fence = drm_syncobj_fence_get(syncobjs<wbr>[i]);<br>
+ if (!entries[i].fence) {<br>
+ ret = -EINVAL;<br>
+ goto err_cleanup_entries;<br>
+ }<br>
</blockquote>
So we are taking a snapshot here. It looks like this could have been<br>
done using a dma_fence_array + dma_fence_proxy for capturing the future<br>
fence.<br></blockquote></div></div></blockquote><div><br></div><div>I'm not sure what you mean. Is that something in the future fence series that I should be looking at?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ ret = timeout;<br>
+ while (ret > 0) {<br>
+ signaled_count = 0;<br>
+ for (i = 0; i < count; ++i) {<br>
+ fence = entries[i].fence;<br>
+ if (!fence)<br>
+ continue;<br>
+<br>
+ if (dma_fence_is_signaled(fence) ||<br>
+ (!entries[i].fence_cb.func &&<br>
+ dma_fence_add_callback(fence,<br>
+ &entries[i].fence_cb,<br>
+ syncobj_wait_fence_func))) {<br>
+ /* The fence has been signaled */<br>
+ if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L) {<br>
+ signaled_count++;<br>
+ } else {<br>
+ if (idx)<br>
+ *idx = i;<br>
+ goto done_waiting;<br>
+ }<br>
+ }<br>
+ }<br>
+<br>
+ if ((flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L) &&<br>
+ signaled_count == count)<br>
+ goto done_waiting;<br>
+<br>
+ set_current_state(TASK_INTERR<wbr>UPTIBLE);<br>
</blockquote>
This is the first step in the wait-loop, you set TASK_INTERRUPTIBLE<br>
before doing any checks. So that if any state changes whilst you are in<br>
the middle of those checks, the schedule_timeout is a nop and you can<br>
check again.<br>
</blockquote>
<br></div></div>
Yeah, completely agree.<br>
<br>
I would rather drop the approach with the custom wait and try to use wait_event_interruptible here.<br>
<br>
As far as I can see that should make the whole patch much cleaner in general.<br></blockquote><div><br></div><div>Sounds good to me.<br><br></div><div>--Jason <br></div></div><br></div></div>