<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Following earlier discussions in
particular with James Jones at Nvidia, I think we established this
patch/feature is not needed.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">This feature was indented to fix a
failing test on our implementation.<br>
</div>
<div class="moz-cite-prefix">I've just submitted a MR to delete that
test : <a
href="https://gitlab.freedesktop.org/mesa/crucible/merge_requests/55">https://gitlab.freedesktop.org/mesa/crucible/merge_requests/55</a></div>
<div class="moz-cite-prefix">I think it is invalid. </div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">We should be able to workaround the
submission thread race condition issue by just resetting a binary
semaphore to be signaled in vkQueueSubmit before submitting the
workload, so that further waits happen on the right dma-fence.</div>
<div class="moz-cite-prefix">This might be a bit more costly (more
ioctls) than the feature in this patch, so I'm looking for your
feedback on this.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Thanks a lot,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">-Lionel<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 17/09/2019 16:06, Lionel Landwerlin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:c7d310e4-7865-6cb3-5730-f896886c7ce5@intel.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div class="moz-cite-prefix">Thanks David,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">I'll try to fix the test to match
AMD's restrictions.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The v7 here was to fix another
existing test :
dEQP-VK.api.external.fence.sync_fd.transference_temporary</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Cheers,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">-Lionel<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 17/09/2019 15:36, Zhou,
David(ChunMing) wrote:<br>
</div>
<blockquote type="cite"
cite="mid:MWHPR12MB14060F7960E92E46B0CCC63BB48F0@MWHPR12MB1406.namprd12.prod.outlook.com">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> Hi Lionel,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> The update looks good
to me.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> I tried your
signal-order test, seems it isn't ready to run, not sure if I
can reproduce your this issue.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> -David</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
Lionel Landwerlin <a class="moz-txt-link-rfc2396E"
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
<b>Sent:</b> Tuesday, September 17, 2019 7:03 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated"
href="mailto:dri-devel@lists.freedesktop.org"
moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-rfc2396E"
href="mailto:dri-devel@lists.freedesktop.org"
moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a><br>
<b>Cc:</b> Lionel Landwerlin <a
class="moz-txt-link-rfc2396E"
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a>;
Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E"
href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"><David1.Zhou@amd.com></a>;
Koenig, Christian <a class="moz-txt-link-rfc2396E"
href="mailto:Christian.Koenig@amd.com"
moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
Jason Ekstrand <a class="moz-txt-link-rfc2396E"
href="mailto:jason@jlekstrand.net" moz-do-not-send="true"><jason@jlekstrand.net></a><br>
<b>Subject:</b> [PATCH 1/1] drm/syncobj: add sideband
payload</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span
style="font-size:11pt;">
<div class="PlainText">The Vulkan timeline semaphores
allow signaling to happen on the point<br>
of the timeline without all of the its dependencies to
be created.<br>
<br>
The current 2 implementations (AMD/Intel) of the Vulkan
spec on top of<br>
the Linux kernel are using a thread to wait on the
dependencies of a<br>
given point to materialize and delay actual submission
to the kernel<br>
driver until the wait completes.<br>
<br>
If a binary semaphore is submitted for signaling along
the side of a<br>
timeline semaphore waiting for completion that means
that the drm<br>
syncobj associated with that binary semaphore will not
have a DMA<br>
fence associated with it by the time vkQueueSubmit()
returns. This and<br>
the fact that a binary semaphore can be signaled and
unsignaled as<br>
before its DMA fences materialize mean that we cannot
just rely on the<br>
fence within the syncobj but we also need a sideband
payload verifying<br>
that the fence in the syncobj matches the last
submission from the<br>
Vulkan API point of view.<br>
<br>
This change adds a sideband payload that is incremented
with signaled<br>
syncobj when vkQueueSubmit() is called. The next
vkQueueSubmit()<br>
waiting on a the syncobj will read the sideband payload
and wait for a<br>
fence chain element with a seqno superior or equal to
the sideband<br>
payload value to be added into the fence chain and use
that fence to<br>
trigger the submission on the kernel driver.<br>
<br>
v2: Use a separate ioctl to get/set the sideband value
(Christian)<br>
<br>
v3: Use 2 ioctls for get/set (Christian)<br>
<br>
v4: Use a single new ioctl<br>
<br>
v5: a bunch of blattant mistakes<br>
Store payload atomically (Chris)<br>
<br>
v6: Only touch atomic value once (Jason)<br>
<br>
v7: Updated atomic value when importing sync file<br>
<br>
Signed-off-by: Lionel Landwerlin <a
class="moz-txt-link-rfc2396E"
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
Reviewed-by: David Zhou <a
class="moz-txt-link-rfc2396E"
href="mailto:David1.Zhou@amd.com"
moz-do-not-send="true"><David1.Zhou@amd.com></a>
(v6)<br>
Cc: Christian Koenig <a class="moz-txt-link-rfc2396E"
href="mailto:Christian.Koenig@amd.com"
moz-do-not-send="true"><Christian.Koenig@amd.com></a><br>
Cc: Jason Ekstrand <a class="moz-txt-link-rfc2396E"
href="mailto:jason@jlekstrand.net"
moz-do-not-send="true"><jason@jlekstrand.net></a><br>
Cc: David(ChunMing) Zhou <a
class="moz-txt-link-rfc2396E"
href="mailto:David1.Zhou@amd.com"
moz-do-not-send="true"><David1.Zhou@amd.com></a><br>
---<br>
drivers/gpu/drm/drm_internal.h | 2 ++<br>
drivers/gpu/drm/drm_ioctl.c | 3 ++<br>
drivers/gpu/drm/drm_syncobj.c | 64
++++++++++++++++++++++++++++++++--<br>
include/drm/drm_syncobj.h | 9 +++++<br>
include/uapi/drm/drm.h | 17 +++++++++<br>
5 files changed, 93 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/drm_internal.h
b/drivers/gpu/drm/drm_internal.h<br>
index 51a2055c8f18..e297dfd85019 100644<br>
--- a/drivers/gpu/drm/drm_internal.h<br>
+++ b/drivers/gpu/drm/drm_internal.h<br>
@@ -208,6 +208,8 @@ int
drm_syncobj_timeline_signal_ioctl(struct drm_device
*dev, void *data,<br>
struct drm_file
*file_private);<br>
int drm_syncobj_query_ioctl(struct drm_device *dev,
void *data,<br>
struct drm_file
*file_private);<br>
+int drm_syncobj_binary_ioctl(struct drm_device *dev,
void *data,<br>
+ struct drm_file
*file_private);<br>
<br>
/* drm_framebuffer.c */<br>
void drm_framebuffer_print_info(struct drm_printer *p,
unsigned int indent,<br>
diff --git a/drivers/gpu/drm/drm_ioctl.c
b/drivers/gpu/drm/drm_ioctl.c<br>
index f675a3bb2c88..644d0bc800a4 100644<br>
--- a/drivers/gpu/drm/drm_ioctl.c<br>
+++ b/drivers/gpu/drm/drm_ioctl.c<br>
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc
drm_ioctls[] = {<br>
DRM_RENDER_ALLOW),<br>
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY,
drm_syncobj_query_ioctl,<br>
DRM_RENDER_ALLOW),<br>
+ DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY,
drm_syncobj_binary_ioctl,<br>
+ DRM_RENDER_ALLOW),<br>
+<br>
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE,
drm_crtc_get_sequence_ioctl, 0),<br>
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE,
drm_crtc_queue_sequence_ioctl, 0),<br>
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE,
drm_mode_create_lease_ioctl, DRM_MASTER),<br>
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c<br>
index 4b5c7b0ed714..2de8f1380890 100644<br>
--- a/drivers/gpu/drm/drm_syncobj.c<br>
+++ b/drivers/gpu/drm/drm_syncobj.c<br>
@@ -418,8 +418,10 @@ int drm_syncobj_create(struct
drm_syncobj **out_syncobj, uint32_t flags,<br>
if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)<br>
drm_syncobj_assign_null_handle(syncobj);<br>
<br>
- if (fence)<br>
+ if (fence) {<br>
drm_syncobj_replace_fence(syncobj,
fence);<br>
+
atomic64_set(&syncobj->binary_payload,
fence->seqno);<br>
+ }<br>
<br>
*out_syncobj = syncobj;<br>
return 0;<br>
@@ -604,6 +606,7 @@ static int
drm_syncobj_import_sync_file_fence(struct drm_file
*file_private,<br>
}<br>
<br>
drm_syncobj_replace_fence(syncobj, fence);<br>
+ atomic64_set(&syncobj->binary_payload,
fence->seqno);<br>
dma_fence_put(fence);<br>
drm_syncobj_put(syncobj);<br>
return 0;<br>
@@ -1224,8 +1227,10 @@ drm_syncobj_reset_ioctl(struct
drm_device *dev, void *data,<br>
if (ret < 0)<br>
return ret;<br>
<br>
- for (i = 0; i < args->count_handles; i++)<br>
+ for (i = 0; i < args->count_handles; i++)
{<br>
drm_syncobj_replace_fence(syncobjs[i],
NULL);<br>
+
atomic64_set(&syncobjs[i]->binary_payload, 0);<br>
+ }<br>
<br>
drm_syncobj_array_free(syncobjs,
args->count_handles);<br>
<br>
@@ -1395,6 +1400,61 @@ int
drm_syncobj_query_ioctl(struct drm_device *dev, void
*data,<br>
if (ret)<br>
break;<br>
}<br>
+<br>
+ drm_syncobj_array_free(syncobjs,
args->count_handles);<br>
+<br>
+ return ret;<br>
+}<br>
+<br>
+int drm_syncobj_binary_ioctl(struct drm_device *dev,
void *data,<br>
+ struct drm_file
*file_private)<br>
+{<br>
+ struct drm_syncobj_binary_array *args = data;<br>
+ struct drm_syncobj **syncobjs;<br>
+ u32 __user *access_flags =
u64_to_user_ptr(args->access_flags);<br>
+ u64 __user *values =
u64_to_user_ptr(args->values);<br>
+ u32 i;<br>
+ int ret;<br>
+<br>
+ if (!drm_core_check_feature(dev,
DRIVER_SYNCOBJ_TIMELINE))<br>
+ return -EOPNOTSUPP;<br>
+<br>
+ if (args->pad != 0)<br>
+ return -EINVAL;<br>
+<br>
+ if (args->count_handles == 0)<br>
+ return -EINVAL;<br>
+<br>
+ ret = drm_syncobj_array_find(file_private,<br>
+
u64_to_user_ptr(args->handles),<br>
+
args->count_handles,<br>
+ &syncobjs);<br>
+ if (ret < 0)<br>
+ return ret;<br>
+<br>
+ for (i = 0; i < args->count_handles; i++)
{<br>
+ u32 flags;<br>
+ u64 value;<br>
+<br>
+ if (get_user(flags,
&access_flags[i])) {<br>
+ ret = -EFAULT;<br>
+ break;<br>
+ }<br>
+<br>
+ if (flags &
DRM_SYNCOBJ_BINARY_VALUE_INC)<br>
+ value =
atomic64_inc_return(&syncobjs[i]->binary_payload)
- 1;<br>
+ else if (flags &
DRM_SYNCOBJ_BINARY_VALUE_READ)<br>
+ value =
atomic64_read(&syncobjs[i]->binary_payload);<br>
+<br>
+ if (flags &
DRM_SYNCOBJ_BINARY_VALUE_READ) {<br>
+ if (put_user(value,
&values[i])) {<br>
+ ret = -EFAULT;<br>
+ break;<br>
+ }<br>
+ }<br>
+<br>
+ }<br>
+<br>
drm_syncobj_array_free(syncobjs,
args->count_handles);<br>
<br>
return ret;<br>
diff --git a/include/drm/drm_syncobj.h
b/include/drm/drm_syncobj.h<br>
index 6cf7243a1dc5..aa76cb3f9107 100644<br>
--- a/include/drm/drm_syncobj.h<br>
+++ b/include/drm/drm_syncobj.h<br>
@@ -61,6 +61,15 @@ struct drm_syncobj {<br>
* @file: A file backing for this syncobj.<br>
*/<br>
struct file *file;<br>
+ /**<br>
+ * @binary_payload: A 64bit payload for binary
syncobjs.<br>
+ *<br>
+ * We use the payload value to wait on binary
syncobj fences to<br>
+ * materialize. It is a reservation mechanism
for the signaler to<br>
+ * express that at some point in the future a
dma fence with the same<br>
+ * seqno will be put into the syncobj.<br>
+ */<br>
+ atomic64_t binary_payload;<br>
};<br>
<br>
void drm_syncobj_free(struct kref *kref);<br>
diff --git a/include/uapi/drm/drm.h
b/include/uapi/drm/drm.h<br>
index 8a5b2f8f8eb9..78a0a413b788 100644<br>
--- a/include/uapi/drm/drm.h<br>
+++ b/include/uapi/drm/drm.h<br>
@@ -785,6 +785,22 @@ struct drm_syncobj_timeline_array {<br>
__u32 pad;<br>
};<br>
<br>
+struct drm_syncobj_binary_array {<br>
+ /* A pointer to an array of u32 syncobj handles.
*/<br>
+ __u64 handles;<br>
+ /* A pointer to an array of u32 access flags for
each handle. */<br>
+ __u64 access_flags;<br>
+ /* The binary value of a syncobj is read before
it is incremented. */<br>
+#define DRM_SYNCOBJ_BINARY_VALUE_READ (1u << 0)<br>
+#define DRM_SYNCOBJ_BINARY_VALUE_INC (1u << 1)<br>
+ /* A pointer to an array of u64 values written
to by the kernel if the<br>
+ * handle is flagged for reading.<br>
+ */<br>
+ __u64 values;<br>
+ /* The length of the 3 arrays above. */<br>
+ __u32 count_handles;<br>
+ __u32 pad;<br>
+};<br>
<br>
/* Query current scanout sequence number */<br>
struct drm_crtc_get_sequence {<br>
@@ -946,6 +962,7 @@ extern "C" {<br>
#define DRM_IOCTL_SYNCOBJ_QUERY DRM_IOWR(0xCB,
struct drm_syncobj_timeline_array)<br>
#define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC,
struct drm_syncobj_transfer)<br>
#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL
DRM_IOWR(0xCD, struct drm_syncobj_timeline_array)<br>
+#define DRM_IOCTL_SYNCOBJ_BINARY DRM_IOWR(0xCE,
struct drm_syncobj_binary_array)<br>
<br>
/**<br>
* Device specific ioctls should only be in their
respective headers<br>
-- <br>
2.23.0<br>
<br>
</div>
</span></font></div>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</body>
</html>