<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 9, 2017 at 11:25 AM, Christian König <span dir="ltr"><<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Am 09.08.2017 um 19:57 schrieb Chris Wilson:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Quoting Jason Ekstrand (2017-08-09 18:00:54)<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Vulkan VkFence semantics require that the application be able to perform<br>
a CPU wait on work which may not yet have been submitted.  This is<br>
perfectly safe because the CPU wait has a timeout which will get<br>
triggered eventually if no work is ever submitted.  This behavior is<br>
advantageous for multi-threaded workloads because, so long as all of the<br>
threads agree on what fences to use up-front, you don't have the extra<br>
cross-thread synchronization cost of thread A telling thread B that it<br>
has submitted its dependent work and thread B is now free to wait.<br>
<br>
Within a single process, this can be implemented in the userspace driver<br>
by doing exactly the same kind of tracking the app would have to do<br>
using posix condition variables or similar.  However, in order for this<br>
to work cross-process (as is required by VK_KHR_external_fence), we need<br>
to handle this in the kernel.<br>
<br>
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which<br>
instructs the IOCTL to wait for the syncobj to have a non-null fence and<br>
then wait on the fence.  Combined with DRM_IOCTL_SYNCOBJ_RESET, you can<br>
easily get the Vulkan behavior.<br>
<br>
v2:<br>
  - Fix a bug in the invalid syncobj error path<br>
  - Unify the wait-all and wait-any cases<br>
<br>
Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br>
Cc: Dave Airlie <<a href="mailto:airlied@redhat.com" target="_blank">airlied@redhat.com</a>><br>
---<br>
<br>
I realized today (this is what happens when you sleep) that it takes almost<br>
no work to make the wait-any path also handle wait-all.  By unifying the<br>
two, I deleted over a hundred lines of code from the implementation.<br>
<br>
  drivers/gpu/drm/drm_syncobj.c | 243 ++++++++++++++++++++++++++++++<wbr>++++--------<br>
  include/uapi/drm/drm.h        |   1 +<br>
  2 files changed, 199 insertions(+), 45 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/drm_syncobj.<wbr>c b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
index 510dfc2..5e7f654 100644<br>
--- a/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
+++ b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
@@ -51,6 +51,7 @@<br>
  #include <linux/fs.h><br>
  #include <linux/anon_inodes.h><br>
  #include <linux/sync_file.h><br>
+#include <linux/sched/signal.h><br>
    #include "drm_internal.h"<br>
  #include <drm/drm_syncobj.h><br>
@@ -104,6 +105,27 @@ static void drm_syncobj_add_callback_locke<wbr>d(struct drm_syncobj *syncobj,<br>
</blockquote>
This is a bit of the puzzle that is missing.<br></blockquote></div></div></blockquote><div><br></div><div>?  It's in the previous patch.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
         list_add_tail(&cb->node, &syncobj->cb_list);<br>
  }<br>
  +static int drm_syncobj_fence_get_or_add_c<wbr>allback(struct drm_syncobj *syncobj,<br>
+                                                struct dma_fence **fence,<br>
+                                                struct drm_syncobj_cb *cb,<br>
+                                                drm_syncobj_func_t func)<br>
+{<br>
+       int ret;<br>
+<br>
+       spin_lock(&syncobj->lock);<br>
+       if (syncobj->fence) {<br>
+               *fence = dma_fence_get(syncobj->fence);<br>
+               ret = 1;<br>
+       } else {<br>
+               *fence = NULL;<br>
+               drm_syncobj_add_callback_lock<wbr>ed(syncobj, cb, func);<br>
+               ret = 0;<br>
+       }<br>
+       spin_unlock(&syncobj->lock);<br>
+<br>
+       return ret;<br>
+}<br>
+<br>
  /**<br>
   * drm_syncobj_add_callback - adds a callback to syncobj::cb_list<br>
   * @syncobj: Sync object to which to add the callback<br>
@@ -526,6 +548,159 @@ drm_syncobj_fd_to_handle_ioctl<wbr>(struct drm_device *dev, void *data,<br>
                                         &args->handle);<br>
  }<br>
  +static int drm_syncobj_signaled(struct drm_syncobj *syncobj,<br>
+                               uint32_t wait_flags)<br>
+{<br>
+       struct dma_fence *fence;<br>
+       int ret;<br>
+<br>
+       fence = drm_syncobj_fence_get(syncobj)<wbr>;<br>
+       if (!fence) {<br>
+               if (wait_flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FO<wbr>R_SUBMIT)<br>
+                       return 0;<br>
+               else<br>
+                       return -EINVAL;<br>
+       }<br>
+<br>
+       ret = dma_fence_is_signaled(fence) ? 1 : 0;<br>
+<br>
+       dma_fence_put(fence);<br>
+<br>
+       return ret;<br>
+}<br>
+<br>
+struct syncobj_wait_entry {<br>
+       struct task_struct *task;<br>
+       struct dma_fence *fence;<br>
+       struct dma_fence_cb fence_cb;<br>
+       struct drm_syncobj_cb syncobj_cb;<br>
+};<br>
+<br>
+static void syncobj_wait_fence_func(struct dma_fence *fence,<br>
+                                   struct dma_fence_cb *cb)<br>
+{<br>
+       struct syncobj_wait_entry *wait =<br>
+               container_of(cb, struct syncobj_wait_entry, fence_cb);<br>
+<br>
+       wake_up_process(wait->task);<br>
+}<br>
+<br>
+static void syncobj_wait_syncobj_func(stru<wbr>ct drm_syncobj *syncobj,<br>
+                                     struct drm_syncobj_cb *cb)<br>
+{<br>
+       struct syncobj_wait_entry *wait =<br>
+               container_of(cb, struct syncobj_wait_entry, syncobj_cb);<br>
+<br>
+       /* This happens inside the syncobj lock */<br>
+       wait->fence = dma_fence_get(syncobj->fence);<br>
+       wake_up_process(wait->task);<br>
+}<br>
+<br>
+static signed long drm_syncobj_array_wait_timeout<wbr>(struct drm_syncobj **syncobjs,<br>
+                                                 uint32_t count,<br>
+                                                 uint32_t flags,<br>
+                                                 signed long timeout,<br>
+                                                 uint32_t *idx)<br>
+{<br>
+       struct syncobj_wait_entry *entries;<br>
+       struct dma_fence *fence;<br>
+       signed long ret;<br>
+       uint32_t signaled_count, i;<br>
+<br>
+       if (timeout == 0) {<br>
+               signaled_count = 0;<br>
+               for (i = 0; i < count; ++i) {<br>
+                       ret = drm_syncobj_signaled(syncobjs[<wbr>i], flags);<br>
+                       if (ret < 0)<br>
+                               return ret;<br>
+                       if (ret == 0)<br>
+                               continue;<br>
+                       if (signaled_count == 0 && idx)<br>
+                               *idx = i;<br>
+                       signaled_count++;<br>
+               }<br>
+<br>
+               if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L)<br>
+                       return signaled_count == count ? 1 : 0;<br>
+               else<br>
+                       return signaled_count > 0 ? 1 : 0;<br>
+       }<br>
+<br>
+       entries = kcalloc(count, sizeof(*entries), GFP_KERNEL);<br>
+       if (!entries)<br>
+               return -ENOMEM;<br>
+<br>
+       for (i = 0; i < count; ++i) {<br>
+               entries[i].task = current;<br>
+<br>
+               if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FO<wbr>R_SUBMIT) {<br>
+                       drm_syncobj_fence_get_or_add_<wbr>callback(syncobjs[i],<br>
+                                                             &entries[i].fence,<br>
+                                                             &entries[i].syncobj_cb,<br>
+                                                             syncobj_wait_syncobj_func);<br>
+               } else {<br>
+                       entries[i].fence = drm_syncobj_fence_get(syncobjs<wbr>[i]);<br>
+                       if (!entries[i].fence) {<br>
+                               ret = -EINVAL;<br>
+                               goto err_cleanup_entries;<br>
+                       }<br>
</blockquote>
So we are taking a snapshot here. It looks like this could have been<br>
done using a dma_fence_array + dma_fence_proxy for capturing the future<br>
fence.<br></blockquote></div></div></blockquote><div><br></div><div>I'm not sure what you mean.  Is that something in the future fence series that I should be looking at?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+       ret = timeout;<br>
+       while (ret > 0) {<br>
+               signaled_count = 0;<br>
+               for (i = 0; i < count; ++i) {<br>
+                       fence = entries[i].fence;<br>
+                       if (!fence)<br>
+                               continue;<br>
+<br>
+                       if (dma_fence_is_signaled(fence) ||<br>
+                           (!entries[i].fence_cb.func &&<br>
+                            dma_fence_add_callback(fence,<br>
+                                                   &entries[i].fence_cb,<br>
+                                                   syncobj_wait_fence_func))) {<br>
+                               /* The fence has been signaled */<br>
+                               if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L) {<br>
+                                       signaled_count++;<br>
+                               } else {<br>
+                                       if (idx)<br>
+                                               *idx = i;<br>
+                                       goto done_waiting;<br>
+                               }<br>
+                       }<br>
+               }<br>
+<br>
+               if ((flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AL<wbr>L) &&<br>
+                   signaled_count == count)<br>
+                       goto done_waiting;<br>
+<br>
+               set_current_state(TASK_INTERR<wbr>UPTIBLE);<br>
</blockquote>
This is the first step in the wait-loop, you set TASK_INTERRUPTIBLE<br>
before doing any checks. So that if any state changes whilst you are in<br>
the middle of those checks, the schedule_timeout is a nop and you can<br>
check again.<br>
</blockquote>
<br></div></div>
Yeah, completely agree.<br>
<br>
I would rather drop the approach with the custom wait and try to use wait_event_interruptible here.<br>
<br>
As far as I can see that should make the whole patch much cleaner in general.<br></blockquote><div><br></div><div>Sounds good to me.<br><br></div><div>--Jason <br></div></div><br></div></div>