[PATCH v4] drm/amdkfd: Provide SMI events watch
Amber Lin
Amber.Lin at amd.com
Wed Apr 15 13:36:42 UTC 2020
Thank you Felix. Now I understand the problem of global client ID is
leaking a hole for potential attackers. I didn't take that into
consideration. I'll change that following your advice below.
Hi Alex,
Thank you for the link. It's helpful. I have a question regarding the
versioning. One topic in the article talks about how the userspace can
figure out if the new ioctl is supported in a given kernel. Is it
correct that with dkms driver, we use the driver version coming from
AMDGPU_VERSION in amdgpu_drv.c, and in upstream kernel we use the kernel
version?
Thanks.
Amber
On 2020-04-14 11:03 p.m., Deucher, Alexander wrote:
>
> [AMD Public Use]
>
>
> Some good advice on getting ioctls right:
> https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of
> Felix Kuehling <felix.kuehling at amd.com>
> *Sent:* Tuesday, April 14, 2020 10:40 PM
> *To:* Lin, Amber <Amber.Lin at amd.com>; amd-gfx at lists.freedesktop.org
> <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH v4] drm/amdkfd: Provide SMI events watch
>
> Hi Amber,
>
> I understand that different processes can get the same FD. My
> statement about FD being unique is relative to one process.
>
> The main problem with the global client ID is, that it allows process
> A to change the event mask of process B just by specifying process B's
> client ID. That can lead to denial of service attacks where process A
> can cause events not to be delivered to B or can flood process B with
> frequent events that it's not prepared to handle.
>
> Therefore you must make the lookup of the client from the client ID
> not from a global list, but from a per-process list. That way process
> A can only change event masks of process A clients, and not those of
> any other process.
>
> But if the client list is process-specific, you can use the FD as a
> unique identifier of the client within the process, so you don't need
> a separate client ID.
>
> Regards,
> Felix
>
> Am 2020-04-14 um 8:09 p.m. schrieb Lin, Amber:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Felix,
>>
>> That was my assumption too that each registration will get different
>> file descriptor, but it turns out not. When I started two process and
>> both register gpu0 and gpu1, they both got fd=15. If I have process A
>> register gpu0+gpu1, and process B only register gpu0, process A gets
>> fd=15 and process B gets fd=9. That’s why I added client ID.
>>
>> By multiple clients, I mean multiple processes. The ask is users want
>> to have multiple tools and those different tools can use rsmi lib to
>> watch events at the same time. Due to the reason above that two
>> processes can actually get the same fd and I need to add client ID to
>> distinguish the registration, I don’t see the point of limiting one
>> registration per process unless I use pid to distinguish the client
>> instead, which was in my consideration too when I was writing the
>> code. But second thought is why adding this restriction when client
>> ID can allow the tool to watch different events on different devices
>> if they want to. Maybe client ID is a bad term and it misleads you. I
>> should call it register ID.
>>
>> Regards,
>>
>> Amber
>>
>> *From:* Kuehling, Felix <Felix.Kuehling at amd.com>
>> <mailto:Felix.Kuehling at amd.com>
>> *Sent:* Tuesday, April 14, 2020 7:04 PM
>> *To:* Lin, Amber <Amber.Lin at amd.com> <mailto:Amber.Lin at amd.com>;
>> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH v4] drm/amdkfd: Provide SMI events watch
>>
>> Hi Amber,
>>
>> Some general remarks about the multi-client support. You added a
>> global client id that's separate from the file descriptor. That's
>> problematic for two reasons:
>>
>> 1. A process could change a different process' event mask
>> 2. The FD should already be unique per process, no need to invent
>> another ID
>>
>> If we want to allow one process to register for events multiple times
>> (multiple FDs per process), then the list of clients should be per
>> process. Each process should only be allowed to change the event
>> masks of its own clients. The client could be identified by its FD.
>> No need for another client ID.
>>
>> But you could also simplify it further by allowing only one event
>> client per process. Then you don't need the client ID lookup at all.
>> Just have a single event client in the kfd_process.
>>
>> Another approach would be to make enable/disable functions of the
>> event FD, rather than the KFD FD ioctl. It could be an ioctl of the
>> event FD, or even simpler, you could use the write file-operation to
>> write an event mask (of arbitrary length if you want to enable growth
>> in the future). That way everything would be neatly encapsulated in
>> the event FD private data.
>>
>> Two more comments inline ...
>>
>> Am 2020-04-14 um 5:30 p.m. schrieb Amber Lin:
>>
>> When the compute is malfunctioning or performance drops, the system admin
>>
>> will use SMI (System Management Interface) tool to monitor/diagnostic what
>>
>> went wrong. This patch provides an event watch interface for the user
>>
>> space to register devices and subscribe events they are interested. After
>>
>> registered, the user can use annoymous file descriptor's poll function
>>
>> with wait-time specified and wait for events to happen. Once an event
>>
>> happens, the user can use read() to retrieve information related to the
>>
>> event.
>>
>>
>>
>> VM fault event is done in this patch.
>>
>>
>>
>> v2: - remove UNREGISTER and add event ENABLE/DISABLE
>>
>> - correct kfifo usage
>>
>> - move event message API to kfd_ioctl.h
>>
>> v3: send the event msg in text than in binary
>>
>> v4: support multiple clients
>>
>>
>>
>> Signed-off-by: Amber Lin<Amber.Lin at amd.com> <mailto:Amber.Lin at amd.com>
>>
>> [snip]
>>
>> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
>>
>> index 4f66764..8146437 100644
>>
>> --- a/include/uapi/linux/kfd_ioctl.h
>>
>> +++ b/include/uapi/linux/kfd_ioctl.h
>>
>> @@ -442,6 +442,36 @@ struct kfd_ioctl_import_dmabuf_args {
>>
>> __u32 dmabuf_fd; /* to KFD */
>>
>> };
>>
>>
>>
>> +/*
>>
>> + * KFD SMI(System Management Interface) events
>>
>> + */
>>
>> +enum kfd_smi_events_op {
>>
>> + KFD_SMI_EVENTS_REGISTER = 1,
>>
>> + KFD_SMI_EVENTS_ENABLE,
>>
>> + KFD_SMI_EVENTS_DISABLE
>>
>> +};
>>
>> +
>>
>> +/* Event type (defined by bitmask) */
>>
>> +#define KFD_SMI_EVENT_VMFAULT 0x0000000000000001
>>
>> +
>>
>> +struct kfd_ioctl_smi_events_args {
>>
>> + __u32 op; /* to KFD */
>>
>> + __u64 events; /* to KFD */
>>
>> The binary layout of the ioctl args structure should be the same on
>> 32/64-bit. That means the 64-bit members should be 64-bit aligned.
>> The best way to ensure this is to put all the 64-bit members first.
>>
>>
>>
>> + __u64 gpuids_array_ptr; /* to KFD */
>>
>> + __u32 num_gpuids; /* to KFD */
>>
>> + __u32 anon_fd; /* from KFD */
>>
>> + __u32 client_id; /* to/from KFD */
>>
>> +};
>>
>> +
>>
>> +/* 1. All messages must start with (hex)uint64_event(16) + space(1) +
>>
>> + * (hex)gpuid(8) + space(1) = 26 bytes
>>
>> + * 2. VmFault msg = (hex)uint32_pid(8) + space(1) + task name(16) = 25
>>
>> + * When a new event msg uses more memory, change the calculation here.
>>
>> + * 3. End with \n(1)
>>
>> + * 26 + 25 + 1 = 52
>>
>> + */
>>
>> +#define KFD_SMI_MAX_EVENT_MSG 52
>>
>> If you define the maximum message length here, clients may start
>> depending on it, and it gets harder to change it later. I'd not
>> define this in the API header. It's not necessary to write correct
>> clients. And if used badly, it may encourage writing incorrect
>> clients that break with longer messages in the future.
>>
>> Regards,
>> Felix
>>
>>
>>
>> +
>>
>> /* Register offset inside the remapped mmio page
>>
>> */
>>
>> enum kfd_mmio_remap {
>>
>> @@ -546,7 +576,10 @@ enum kfd_mmio_remap {
>>
>> #define AMDKFD_IOC_ALLOC_QUEUE_GWS \
>>
>> AMDKFD_IOWR(0x1E, struct kfd_ioctl_alloc_queue_gws_args)
>>
>>
>>
>> +#define AMDKFD_IOC_SMI_EVENTS \
>>
>> + AMDKFD_IOWR(0x1F, struct kfd_ioctl_smi_events_args)
>>
>> +
>>
>> #define AMDKFD_COMMAND_START 0x01
>>
>> -#define AMDKFD_COMMAND_END 0x1F
>>
>> +#define AMDKFD_COMMAND_END 0x20
>>
>>
>>
>> #endif
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200415/2d021dd2/attachment-0001.htm>
More information about the amd-gfx
mailing list