[PATCH v4] drm/amdkfd: Provide SMI events watch

Amber Lin Amber.Lin at amd.com
Wed Apr 15 13:36:42 UTC 2020


Thank you Felix. Now I understand the problem of global client ID is 
leaking a hole for potential attackers. I didn't take that into 
consideration. I'll change that following your advice below.

Hi Alex,

Thank you for the link. It's helpful. I have a question regarding the 
versioning. One topic in the article talks about how the userspace can 
figure out if the new ioctl is supported in a given kernel. Is it 
correct that with dkms driver, we use the driver version coming from 
AMDGPU_VERSION in amdgpu_drv.c, and in upstream kernel we use the kernel 
version?

Thanks.

Amber

On 2020-04-14 11:03 p.m., Deucher, Alexander wrote:
>
> [AMD Public Use]
>
>
> Some good advice on getting ioctls right:
> https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of 
> Felix Kuehling <felix.kuehling at amd.com>
> *Sent:* Tuesday, April 14, 2020 10:40 PM
> *To:* Lin, Amber <Amber.Lin at amd.com>; amd-gfx at lists.freedesktop.org 
> <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH v4] drm/amdkfd: Provide SMI events watch
>
> Hi Amber,
>
> I understand that different processes can get the same FD. My 
> statement about FD being unique is relative to one process.
>
> The main problem with the global client ID is, that it allows process 
> A to change the event mask of process B just by specifying process B's 
> client ID. That can lead to denial of service attacks where process A 
> can cause events not to be delivered to B or can flood process B with 
> frequent events that it's not prepared to handle.
>
> Therefore you must make the lookup of the client from the client ID 
> not from a global list, but from a per-process list. That way process 
> A can only change event masks of process A clients, and not those of 
> any other process.
>
> But if the client list is process-specific, you can use the FD as a 
> unique identifier of the client within the process, so you don't need 
> a separate client ID.
>
> Regards,
>   Felix
>
> Am 2020-04-14 um 8:09 p.m. schrieb Lin, Amber:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Felix,
>>
>> That was my assumption too that each registration will get different 
>> file descriptor, but it turns out not. When I started two process and 
>> both register gpu0 and gpu1, they both got fd=15. If I have process A 
>> register gpu0+gpu1, and process B only register gpu0, process A gets 
>> fd=15 and process B gets fd=9. That’s why I added client ID.
>>
>> By multiple clients, I mean multiple processes. The ask is users want 
>> to have multiple tools and those different tools can use rsmi lib to 
>> watch events at the same time. Due to the reason above that two 
>> processes can actually get the same fd and I need to add client ID to 
>> distinguish the registration, I don’t see the point of limiting one 
>> registration per process unless I use pid to distinguish the client 
>> instead, which was in my consideration too when I was writing the 
>> code. But second thought is why adding this restriction when client 
>> ID can allow the tool to watch different events on different devices 
>> if they want to. Maybe client ID is a bad term and it misleads you. I 
>> should call it register ID.
>>
>> Regards,
>>
>> Amber
>>
>> *From:* Kuehling, Felix <Felix.Kuehling at amd.com> 
>> <mailto:Felix.Kuehling at amd.com>
>> *Sent:* Tuesday, April 14, 2020 7:04 PM
>> *To:* Lin, Amber <Amber.Lin at amd.com> <mailto:Amber.Lin at amd.com>; 
>> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH v4] drm/amdkfd: Provide SMI events watch
>>
>> Hi Amber,
>>
>> Some general remarks about the multi-client support. You added a 
>> global client id that's separate from the file descriptor. That's 
>> problematic for two reasons:
>>
>>  1. A process could change a different process' event mask
>>  2. The FD should already be unique per process, no need to invent
>>     another ID
>>
>> If we want to allow one process to register for events multiple times 
>> (multiple FDs per process), then the list of clients should be per 
>> process. Each process should only be allowed to change the event 
>> masks of its own clients. The client could be identified by its FD. 
>> No need for another client ID.
>>
>> But you could also simplify it further by allowing only one event 
>> client per process. Then you don't need the client ID lookup at all. 
>> Just have a single event client in the kfd_process.
>>
>> Another approach would be to make enable/disable functions of the 
>> event FD, rather than the KFD FD ioctl. It could be an ioctl of the 
>> event FD, or even simpler, you could use the write file-operation to 
>> write an event mask (of arbitrary length if you want to enable growth 
>> in the future). That way everything would be neatly encapsulated in 
>> the event FD private data.
>>
>> Two more comments inline ...
>>
>> Am 2020-04-14 um 5:30 p.m. schrieb Amber Lin:
>>
>>     When the compute is malfunctioning or performance drops, the system admin
>>
>>     will use SMI (System Management Interface) tool to monitor/diagnostic what
>>
>>     went wrong. This patch provides an event watch interface for the user
>>
>>     space to register devices and subscribe events they are interested. After
>>
>>     registered, the user can use annoymous file descriptor's poll function
>>
>>     with wait-time specified and wait for events to happen. Once an event
>>
>>     happens, the user can use read() to retrieve information related to the
>>
>>     event.
>>
>>       
>>
>>     VM fault event is done in this patch.
>>
>>       
>>
>>     v2: - remove UNREGISTER and add event ENABLE/DISABLE
>>
>>          - correct kfifo usage
>>
>>          - move event message API to kfd_ioctl.h
>>
>>     v3: send the event msg in text than in binary
>>
>>     v4: support multiple clients
>>
>>       
>>
>>     Signed-off-by: Amber Lin<Amber.Lin at amd.com>  <mailto:Amber.Lin at amd.com>
>>
>> [snip]
>>
>>     diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
>>
>>     index 4f66764..8146437 100644
>>
>>     --- a/include/uapi/linux/kfd_ioctl.h
>>
>>     +++ b/include/uapi/linux/kfd_ioctl.h
>>
>>     @@ -442,6 +442,36 @@ struct kfd_ioctl_import_dmabuf_args {
>>
>>        __u32 dmabuf_fd;       /* to KFD */
>>
>>       };
>>
>>       
>>
>>     +/*
>>
>>     + * KFD SMI(System Management Interface) events
>>
>>     + */
>>
>>     +enum kfd_smi_events_op {
>>
>>     + KFD_SMI_EVENTS_REGISTER = 1,
>>
>>     + KFD_SMI_EVENTS_ENABLE,
>>
>>     + KFD_SMI_EVENTS_DISABLE
>>
>>     +};
>>
>>     +
>>
>>     +/* Event type (defined by bitmask) */
>>
>>     +#define KFD_SMI_EVENT_VMFAULT     0x0000000000000001
>>
>>     +
>>
>>     +struct kfd_ioctl_smi_events_args {
>>
>>     + __u32 op;              /* to KFD */
>>
>>     + __u64 events;          /* to KFD */
>>
>> The binary layout of the ioctl args structure should be the same on 
>> 32/64-bit. That means the 64-bit members should be 64-bit aligned. 
>> The best way to ensure this is to put all the 64-bit members first.
>>
>>       
>>
>>     + __u64 gpuids_array_ptr;        /* to KFD */
>>
>>     + __u32 num_gpuids;      /* to KFD */
>>
>>     + __u32 anon_fd;         /* from KFD */
>>
>>     + __u32 client_id;       /* to/from KFD */
>>
>>     +};
>>
>>     +
>>
>>     +/* 1. All messages must start with (hex)uint64_event(16) + space(1) +
>>
>>     + *    (hex)gpuid(8) + space(1) =  26 bytes
>>
>>     + * 2. VmFault msg = (hex)uint32_pid(8) + space(1) + task name(16) = 25
>>
>>     + *    When a new event msg uses more memory, change the calculation here.
>>
>>     + * 3. End with \n(1)
>>
>>     + * 26 + 25 + 1 = 52
>>
>>     + */
>>
>>     +#define KFD_SMI_MAX_EVENT_MSG 52
>>
>> If you define the maximum message length here, clients may start 
>> depending on it, and it gets harder to change it later. I'd not 
>> define this in the API header. It's not necessary to write correct 
>> clients. And if used badly, it may encourage writing incorrect 
>> clients that break with longer messages in the future.
>>
>> Regards,
>>   Felix
>>
>>       
>>
>>     +
>>
>>       /* Register offset inside the remapped mmio page
>>
>>        */
>>
>>       enum kfd_mmio_remap {
>>
>>     @@ -546,7 +576,10 @@ enum kfd_mmio_remap {
>>
>>       #define AMDKFD_IOC_ALLOC_QUEUE_GWS            \
>>
>>               AMDKFD_IOWR(0x1E, struct kfd_ioctl_alloc_queue_gws_args)
>>
>>       
>>
>>     +#define AMDKFD_IOC_SMI_EVENTS                 \
>>
>>     +        AMDKFD_IOWR(0x1F, struct kfd_ioctl_smi_events_args)
>>
>>     +
>>
>>       #define AMDKFD_COMMAND_START           0x01
>>
>>     -#define AMDKFD_COMMAND_END             0x1F
>>
>>     +#define AMDKFD_COMMAND_END             0x20
>>
>>       
>>
>>       #endif
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200415/2d021dd2/attachment-0001.htm>


More information about the amd-gfx mailing list