<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Amber,</p>
<p>Some general remarks about the multi-client support. You added a
global client id that's separate from the file descriptor. That's
problematic for two reasons:</p>
<ol>
<li>A process could change a different process' event mask</li>
<li>The FD should already be unique per process, no need to invent
another ID</li>
</ol>
<p>If we want to allow one process to register for events multiple
times (multiple FDs per process), then the list of clients should
be per process. Each process should only be allowed to change the
event masks of its own clients. The client could be identified by
its FD. No need for another client ID.</p>
<p>But you could also simplify it further by allowing only one event
client per process. Then you don't need the client ID lookup at
all. Just have a single event client in the kfd_process.</p>
<p>Another approach would be to make enable/disable functions of the
event FD, rather than the KFD FD ioctl. It could be an ioctl of
the event FD, or even simpler, you could use the write
file-operation to write an event mask (of arbitrary length if you
want to enable growth in the future). That way everything would be
neatly encapsulated in the event FD private data.<br>
</p>
<p>Two more comments inline ...<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">Am 2020-04-14 um 5:30 p.m. schrieb
Amber Lin:<br>
</div>
<blockquote type="cite" cite="mid:1586899842-28131-1-git-send-email-Amber.Lin@amd.com">
<pre class="moz-quote-pre" wrap="">When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.
VM fault event is done in this patch.
v2: - remove UNREGISTER and add event ENABLE/DISABLE
- correct kfifo usage
- move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
Signed-off-by: Amber Lin <a class="moz-txt-link-rfc2396E" href="mailto:Amber.Lin@amd.com"><Amber.Lin@amd.com></a>
</pre>
</blockquote>
<p>[snip]</p>
<blockquote type="cite" cite="mid:1586899842-28131-1-git-send-email-Amber.Lin@amd.com">
<pre class="moz-quote-pre" wrap="">diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 4f66764..8146437 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -442,6 +442,36 @@ struct kfd_ioctl_import_dmabuf_args {
__u32 dmabuf_fd; /* to KFD */
};
+/*
+ * KFD SMI(System Management Interface) events
+ */
+enum kfd_smi_events_op {
+ KFD_SMI_EVENTS_REGISTER = 1,
+ KFD_SMI_EVENTS_ENABLE,
+ KFD_SMI_EVENTS_DISABLE
+};
+
+/* Event type (defined by bitmask) */
+#define KFD_SMI_EVENT_VMFAULT 0x0000000000000001
+
+struct kfd_ioctl_smi_events_args {
+ __u32 op; /* to KFD */
+ __u64 events; /* to KFD */</pre>
</blockquote>
<p>The binary layout of the ioctl args structure should be the same
on 32/64-bit. That means the 64-bit members should be 64-bit
aligned. The best way to ensure this is to put all the 64-bit
members first.<br>
</p>
<br>
<blockquote type="cite" cite="mid:1586899842-28131-1-git-send-email-Amber.Lin@amd.com">
<pre class="moz-quote-pre" wrap="">
+ __u64 gpuids_array_ptr; /* to KFD */
+ __u32 num_gpuids; /* to KFD */
+ __u32 anon_fd; /* from KFD */
+ __u32 client_id; /* to/from KFD */
+};
+
+/* 1. All messages must start with (hex)uint64_event(16) + space(1) +
+ * (hex)gpuid(8) + space(1) = 26 bytes
+ * 2. VmFault msg = (hex)uint32_pid(8) + space(1) + task name(16) = 25
+ * When a new event msg uses more memory, change the calculation here.
+ * 3. End with \n(1)
+ * 26 + 25 + 1 = 52
+ */
+#define KFD_SMI_MAX_EVENT_MSG 52</pre>
</blockquote>
<p>If you define the maximum message length here, clients may start
depending on it, and it gets harder to change it later. I'd not
define this in the API header. It's not necessary to write correct
clients. And if used badly, it may encourage writing incorrect
clients that break with longer messages in the future.<br>
</p>
<p>Regards,<br>
Felix</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:1586899842-28131-1-git-send-email-Amber.Lin@amd.com">
<pre class="moz-quote-pre" wrap="">
+
/* Register offset inside the remapped mmio page
*/
enum kfd_mmio_remap {
@@ -546,7 +576,10 @@ enum kfd_mmio_remap {
#define AMDKFD_IOC_ALLOC_QUEUE_GWS \
AMDKFD_IOWR(0x1E, struct kfd_ioctl_alloc_queue_gws_args)
+#define AMDKFD_IOC_SMI_EVENTS \
+ AMDKFD_IOWR(0x1F, struct kfd_ioctl_smi_events_args)
+
#define AMDKFD_COMMAND_START 0x01
-#define AMDKFD_COMMAND_END 0x1F
+#define AMDKFD_COMMAND_END 0x20
#endif
</pre>
</blockquote>
</body>
</html>