<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 10, 2017 at 9:13 AM, Christian König <span dir="ltr"><<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="">
<div class="m_3510272536457704976moz-cite-prefix">Am 10.07.2017 um 17:58 schrieb Xie,
AlexBin:<br>
</div>
<blockquote type="cite">
<div class="m_3510272536457704976WordSection1">
<p class="MsoNormal"><a name="m_3510272536457704976__MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">I
understand this discussion from closes source driver
terminology.<u></u><u></u></span></a></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">If
a process is killed before it sends out the signaling
command, will some part of the GPU be in a waiting situation
forever?</span></p>
</div>
</blockquote>
<br></span>
Yes, exactly that's the problem here and the reason why that even
Microsoft forbids that under windows.<span class="HOEnZb"><font color="#888888"><br></font></span></div></blockquote><div><br></div><div>To be clear, we are only discussing wait-before-submit for client CPU waits. The GPU will not be waiting, only some userspace process. If that process is a compositor, it should take measures to avoid getting hung up by bad clients. What this will *not* cause is the GPU (or kernel GPU scheduling) to get hung up on waiting for some unsubmitted thing.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font color="#888888">
Christian.</font></span><div><div class="h5"><br>
<br>
<blockquote type="cite">
<div class="m_3510272536457704976WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Alex
Bin Xie<u></u><u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
amd-gfx [<a class="m_3510272536457704976moz-txt-link-freetext" href="mailto:amd-gfx-bounces@lists.freedesktop.org" target="_blank">mailto:amd-gfx-bounces@lists.<wbr>freedesktop.org</a>]
<b>On Behalf Of </b>Jason Ekstrand<br>
<b>Sent:</b> Monday, July 10, 2017 11:53 AM<br>
<b>To:</b> Christian König <a class="m_3510272536457704976moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de" target="_blank"><deathsimple@vodafone.de></a><br>
<b>Cc:</b> Dave Airlie <a class="m_3510272536457704976moz-txt-link-rfc2396E" href="mailto:airlied@gmail.com" target="_blank"><airlied@gmail.com></a>; Maling
list - DRI developers
<a class="m_3510272536457704976moz-txt-link-rfc2396E" href="mailto:dri-devel@lists.freedesktop.org" target="_blank"><dri-devel@lists.freedesktop.<wbr>org></a>; amd-gfx mailing
list <a class="m_3510272536457704976moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><amd-gfx@lists.freedesktop.<wbr>org></a><br>
<b>Subject:</b> Re: [PATCH] drm/syncobj: add sync obj wait
interface. (v6)<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<div>
<p class="MsoNormal">On Mon, Jul 10, 2017 at 8:45 AM,
Christian König <<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a>>
wrote:<u></u><u></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<div>
<div>
<p class="MsoNormal">Am 10.07.2017 um 17:28
schrieb Jason Ekstrand:<u></u><u></u></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<p class="MsoNormal">On Wed, Jul 5, 2017
at 6:04 PM, Dave Airlie <<a href="mailto:airlied@gmail.com" target="_blank">airlied@gmail.com</a>>
wrote:<u></u><u></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">From: Dave Airlie
<<a href="mailto:airlied@redhat.com" target="_blank">airlied@redhat.com</a>><br>
<br>
This interface will allow sync object
to be used to back<br>
Vulkan fences. This API is pretty much
the vulkan fence waiting<br>
API, and I've ported the code from
amdgpu.<br>
<br>
v2: accept relative timeout, pass
remaining time back<br>
to userspace.<br>
v3: return to absolute timeouts.<br>
v4: absolute zero = poll,<br>
rewrite any/all code to have same
operation for arrays<br>
return -EINVAL for 0 fences.<br>
v4.1: fixup fences allocation check,
use u64_to_user_ptr<br>
v5: move to sec/nsec, and use
timespec64 for calcs.<br>
v6: use -ETIME and drop the out status
flag. (-ETIME<br>
is suggested by ickle, I can feel a
shed painting)<br>
<br>
Signed-off-by: Dave Airlie <<a href="mailto:airlied@redhat.com" target="_blank">airlied@redhat.com</a>><br>
---<br>
drivers/gpu/drm/drm_internal.<wbr>h | 2
+<br>
drivers/gpu/drm/drm_ioctl.c | 2
+<br>
drivers/gpu/drm/drm_syncobj.<wbr>c | 142
++++++++++++++++++++++++++++++<wbr>+++++++++++<br>
include/uapi/drm/drm.h | 13
++++<br>
4 files changed, 159 insertions(+)<br>
<br>
diff --git
a/drivers/gpu/drm/drm_<wbr>internal.h
b/drivers/gpu/drm/drm_<wbr>internal.h<br>
index 5cecc97..d71b50d 100644<br>
--- a/drivers/gpu/drm/drm_<wbr>internal.h<br>
+++ b/drivers/gpu/drm/drm_<wbr>internal.h<br>
@@ -157,3 +157,5 @@ int
drm_syncobj_handle_to_fd_<wbr>ioctl(struct
drm_device *dev, void *data,<br>
struct drm_file *file_private);<br>
int
drm_syncobj_fd_to_handle_<wbr>ioctl(struct
drm_device *dev, void *data,<br>
struct drm_file *file_private);<br>
+int drm_syncobj_wait_ioctl(struct
drm_device *dev, void *data,<br>
+ struct
drm_file *file_private);<br>
diff --git
a/drivers/gpu/drm/drm_ioctl.c
b/drivers/gpu/drm/drm_ioctl.c<br>
index f1e5681..385ce74 100644<br>
--- a/drivers/gpu/drm/drm_ioctl.c<br>
+++ b/drivers/gpu/drm/drm_ioctl.c<br>
@@ -657,6 +657,8 @@ static const
struct drm_ioctl_desc drm_ioctls[] = {<br>
DRM_UNLOCKED|DRM_RENDER_ALLOW)<wbr>,<br>
DRM_IOCTL_DEF(DRM_IOCTL_<wbr>SYNCOBJ_FD_TO_HANDLE,
drm_syncobj_fd_to_handle_<wbr>ioctl,<br>
DRM_UNLOCKED|DRM_RENDER_ALLOW)<wbr>,<br>
+
DRM_IOCTL_DEF(DRM_IOCTL_<wbr>SYNCOBJ_WAIT,
drm_syncobj_wait_ioctl,<br>
+
DRM_UNLOCKED|DRM_RENDER_<wbr>ALLOW),<br>
};<br>
<br>
#define DRM_CORE_IOCTL_COUNT
ARRAY_SIZE( drm_ioctls )<br>
diff --git
a/drivers/gpu/drm/drm_syncobj.<wbr>c
b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
index 89441bc..2d5a7a1 100644<br>
--- a/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
+++ b/drivers/gpu/drm/drm_syncobj.<wbr>c<br>
@@ -1,5 +1,7 @@<br>
/*<br>
* Copyright 2017 Red Hat<br>
+ * Parts ported from amdgpu (fence
wait code).<br>
+ * Copyright 2016 Advanced Micro
Devices, Inc.<br>
*<br>
* Permission is hereby granted, free
of charge, to any person obtaining a<br>
* copy of this software and
associated documentation files (the
"Software"),<br>
@@ -31,6 +33,9 @@<br>
* that contain an optional fence.
The fence can be updated with a new<br>
* fence, or be NULL.<br>
*<br>
+ * syncobj's can be waited upon,
where it will wait for the underlying<br>
+ * fence.<br>
+ *<br>
* syncobj's can be export to fd's
and back, these fd's are opaque and<br>
* have no other use case, except
passing the syncobj between processes.<br>
*<br>
@@ -451,3 +456,140 @@
drm_syncobj_fd_to_handle_<wbr>ioctl(struct
drm_device *dev, void *data,<br>
return
drm_syncobj_fd_to_handle(file_<wbr>private,
args->fd,<br>
&args->handle);<br>
}<br>
+<br>
+/**<br>
+ * drm_timeout_abs_to_jiffies -
calculate jiffies timeout from
absolute value<br>
+ *<br>
+ * @timeout_sec: timeout sec
component, 0 for poll<br>
+ * @timeout_nsec: timeout nsec
component in ns, 0 for poll<br>
+ * both must be 0 for poll.<br>
+ *<br>
+ * Calculate the timeout in jiffies
from an absolute time in sec/nsec.<br>
+ */<br>
+static unsigned long
drm_timeout_abs_to_jiffies(<wbr>int64_t
timeout_sec, uint64_t timeout_nsec)<br>
+{<br>
+ struct timespec64 abs_timeout,
timeout, max_jiffy_timespec;<br>
+ unsigned long timeout_jiffies;<br>
+<br>
+ /* make 0 timeout means poll -
absolute 0 doesn't seem valid */<br>
+ if (timeout_sec == 0
&& timeout_nsec == 0)<br>
+ return 0;<br>
+<br>
+ abs_timeout.tv_sec =
timeout_sec;<br>
+ abs_timeout.tv_nsec =
timeout_nsec;<br>
+<br>
+ /* clamp timeout if it's to
large */<br>
+ if
(!timespec64_valid_strict(&<wbr>abs_timeout))<br>
+ return
MAX_SCHEDULE_TIMEOUT - 1;<br>
+<br>
+ timeout =
timespec64_sub(abs_timeout,
ktime_to_timespec64(ktime_get(<wbr>)));<br>
+ if
(!timespec64_valid(&timeout))<br>
+ return 0;<br>
+<br>
+
jiffies_to_timespec64(MAX_<wbr>JIFFY_OFFSET,
&max_jiffy_timespec);<br>
+ if
(timespec64_compare(&timeout,
&max_jiffy_timespec) >= 0)<br>
+ return
MAX_SCHEDULE_TIMEOUT - 1;<br>
+<br>
+ timeout_jiffies =
timespec64_to_jiffies(&<wbr>timeout);<br>
+ /* clamp timeout to avoid
infinite timeout */<br>
+ if (timeout_jiffies >=
MAX_SCHEDULE_TIMEOUT)<br>
+ return
MAX_SCHEDULE_TIMEOUT - 1;<br>
+<br>
+ return timeout_jiffies + 1;<br>
+}<br>
+<br>
+static int
drm_syncobj_wait_fences(struct
drm_device *dev,<br>
+
struct drm_file *file_private,<br>
+
struct drm_syncobj_wait *wait,<br>
+
struct dma_fence **fences)<br>
+{<br>
+ unsigned long timeout =
drm_timeout_abs_to_jiffies(<wbr>wait->timeout_sec,
wait->timeout_nsec);<br>
+ int ret = 0;<br>
+ uint32_t first = ~0;<br>
+<br>
+ if (wait->flags &
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_<wbr>ALL) {<br>
+ int i;<br>
+ for (i = 0; i <
wait->count_handles; i++) {<br>
+ ret =
dma_fence_wait_timeout(fences[<wbr>i],
true, timeout);<br>
+<br>
+ if (ret <
0)<br>
+ return
ret;<br>
+ if (ret == 0)<br>
+ break;<br>
+ timeout = ret;<br>
+ }<br>
+ first = 0;<br>
+ } else {<br>
+ ret =
dma_fence_wait_any_timeout(<wbr>fences,<br>
+
wait->count_handles,<br>
+
true, timeout,<br>
+
&first);<br>
+ }<br>
+<br>
+ if (ret < 0)<br>
+ return ret;<br>
+<br>
+ wait->first_signaled =
first;<br>
+ if (ret == 0)<br>
+ return -ETIME;<br>
+ return 0;<br>
+}<br>
+<br>
+int<br>
+drm_syncobj_wait_ioctl(struct
drm_device *dev, void *data,<br>
+ struct drm_file
*file_private)<br>
+{<br>
+ struct drm_syncobj_wait *args
= data;<br>
+ uint32_t *handles;<br>
+ struct dma_fence **fences;<br>
+ int ret = 0;<br>
+ int i;<br>
+<br>
+ if
(!drm_core_check_feature(dev,
DRIVER_SYNCOBJ))<br>
+ return -ENODEV;<br>
+<br>
+ if (args->flags != 0
&& args->flags !=
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_<wbr>ALL)<br>
+ return -EINVAL;<br>
+<br>
+ if (args->count_handles ==
0)<br>
+ return -EINVAL;<br>
+<br>
+ /* Get the handles from
userspace */<br>
+ handles =
kmalloc_array(args->count_<wbr>handles,
sizeof(uint32_t),<br>
+
GFP_KERNEL);<br>
+ if (handles == NULL)<br>
+ return -ENOMEM;<br>
+<br>
+ if (copy_from_user(handles,<br>
+
u64_to_user_ptr(args->handles)<wbr>,<br>
+
sizeof(uint32_t) *
args->count_handles)) {<br>
+ ret = -EFAULT;<br>
+ goto err_free_handles;<br>
+ }<br>
+<br>
+ fences =
kcalloc(args->count_handles,<br>
+ sizeof(struct
dma_fence *), GFP_KERNEL);<br>
+ if (!fences) {<br>
+ ret = -ENOMEM;<br>
+ goto err_free_handles;<br>
+ }<br>
+<br>
+ for (i = 0; i <
args->count_handles; i++) {<br>
+ ret =
drm_syncobj_fence_get(file_<wbr>private,
handles[i],<br>
+
&fences[i]);<br>
+ if (ret)<br>
+ goto
err_free_fence_array;<br>
+ }<br>
+<br>
+ ret =
drm_syncobj_wait_fences(dev,
file_private,<br>
+
args, fences);<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">So, reading some
CTS tests again, and I think we have a
problem here. The Vulkan spec allows
you to wait on a fence that is in the
unsignaled state.<u></u><u></u></p>
</div>
</div>
</div>
</div>
</blockquote>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
<p class="MsoNormal">At least on the closed source
driver that would be illegal as far as I know.<u></u><u></u></p>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Then they are doing workarounds in
userspace. There are definitely CTS tests for this:<br>
<br>
<a href="https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74" target="_blank">https://github.com/<wbr>KhronosGroup/VK-GL-CTS/blob/<wbr>master/external/vulkancts/<wbr>modules/vulkan/<wbr>synchronization/<wbr>vktSynchronizationBasicFenceTe<wbr>sts.cpp#L74</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">You can't wait on a semaphore
before the signal operation is send down to the
kerel.<u></u><u></u></p>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">We
(Intel) deal with this today by tracking whether or
not the fence has been submitted and using a condition
variable in userspace to sort it all out. If we ever
want to share fences across processes (which we do),
then this needs to be sorted in the kernel. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">--Jason<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">Regards,<br>
Christian.<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
<u></u><u></u></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<p class="MsoNormal"> In theory, you
could have thread A start waiting on a
fence before thread B submits the work
which triggers that fence. This means
that the dma_fence may not exist yet
when vkWaitForFences gets called. If
we really want to support the full
Vulkan usage, we need to somehow
support missing dma_fences by waiting
for the dma_fence to show up.
Unfortunately, I don't know enough
about the internal kernel APIs to know
what that would look like.<u></u><u></u></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">+<br>
+err_free_fence_array:<br>
+ for (i = 0; i <
args->count_handles; i++)<br>
+
dma_fence_put(fences[i]);<br>
+ kfree(fences);<br>
+err_free_handles:<br>
+ kfree(handles);<br>
+<br>
+ return ret;<br>
+}<br>
diff --git a/include/uapi/drm/drm.h
b/include/uapi/drm/drm.h<br>
index 101593a..91746a7 100644<br>
--- a/include/uapi/drm/drm.h<br>
+++ b/include/uapi/drm/drm.h<br>
@@ -718,6 +718,18 @@ struct
drm_syncobj_handle {<br>
__u32 pad;<br>
};<br>
<br>
+#define
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_<wbr>ALL (1
<< 0)<br>
+struct drm_syncobj_wait {<br>
+ __u64 handles;<br>
+ /* absolute timeout */<br>
+ __s64 timeout_sec;<br>
+ __s64 timeout_nsec;<br>
+ __u32 count_handles;<br>
+ __u32 flags;<br>
+ __u32 first_signaled; /* only
valid when not waiting all */<br>
+ __u32 pad;<br>
+};<br>
+<br>
#if defined(__cplusplus)<br>
}<br>
#endif<br>
@@ -840,6 +852,7 @@ extern "C" {<br>
#define DRM_IOCTL_SYNCOBJ_DESTROY
DRM_IOWR(0xC0, struct
drm_syncobj_destroy)<br>
#define
DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD
DRM_IOWR(0xC1, struct
drm_syncobj_handle)<br>
#define
DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE
DRM_IOWR(0xC2, struct
drm_syncobj_handle)<br>
+#define DRM_IOCTL_SYNCOBJ_WAIT
DRM_IOWR(0xC3, struct
drm_syncobj_wait)<br>
<br>
/**<br>
* Device specific ioctls should only
be in their respective headers<br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb"><span style="color:#888888">--</span></span><span style="color:#888888"><br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb">2.9.4</span><br>
<br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb">______________________________<wbr>_________________</span><br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb">dri-devel
mailing list</span><br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb"><a href="mailto:dri-devel@lists.freedesktop.org" target="_blank">dri-devel@lists.freedesktop.<wbr>org</a></span><br>
<span class="m_3510272536457704976gmail-m-226855681158848481hoenzb"><a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/dri-devel</a></span></span><u></u><u></u></p>
</blockquote>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> <u></u></p>
</div>
</div>
<pre>______________________________<wbr>_________________<u></u><u></u></pre>
<pre>amd-gfx mailing list<u></u><u></u></pre>
<pre><a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><u></u><u></u></pre>
<pre><a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><u></u><u></u></pre>
</blockquote>
<p><u></u> <u></u></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</div></div></div>
</blockquote></div><br></div></div>