[PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys

Christian König ckoenig.leichtzumerken at gmail.com
Fri Dec 10 07:22:18 UTC 2021


Am 09.12.21 um 23:27 schrieb Felix Kuehling:
> Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang:
>> On 12/9/2021 12:40 PM, Felix Kuehling wrote:
>>> Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen:
>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>
>>>> When application is about finish it destroys queues it has created by
>>>> an ioctl. Driver deletes queue
>>>> entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/)
>>>> which is directory including this queue all attributes. Low level
>>>> kernel
>>>> code deletes all attributes under this directory. The lock from
>>>> kernel is
>>>> on queue entry, not its attributes. At meantime another user space
>>>> application
>>>> can read the attributes. There is possibility that the application can
>>>> hold/read the attributes while kernel is deleting the queue entry,
>>>> cause
>>>> the application have invalid memory access, then killed by kernel.
>>>>
>>>> Driver changes: explicitly create/destroy each attribute for each
>>>> queue,
>>>> let kernel put lock on each attribute too.
>>> Is this working around a bug in kobject_del? Shouldn't that code take
>>> care of the necessary locking itself?
>>>
>>> Regards,
>>>     Felix
>> The patches do not change kobject/kernfs that are too low level and
>> would involve deeper discussions.
>> Made changes at higher level(kfd) instead.
>>
>> Have tested with MSF tool overnight.
> OK. I'm OK with your changes. The patch is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
> But I think we should let the kernfs folks know that there is a problem
> anyway. It might save someone else a lot of time and headaches down the
> line. Ideally we'd come up with a small reproducer (dummy driver and a
> user mode tool (could just be a bash script)) that doesn't require
> special AMD hardware and the whole ROCm stack.

I think we could do this in the DKMS/release branches, but for upstream 
we should rather fix the underlying problem.

Additional to that this is explicitely what we should not do if I 
understood Greg correctly in previous discussions, but take that with a 
grain of salt since I'm not an expert on the topic.

Regards,
Christian.

>
> Regards,
>    Felix
>
>
>> Thanks
>> Xiaogang
>>
>>>> Signed-off-by: Xiaogang Chen <xiaogang.chen at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  3 +++
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 33
>>>> +++++++-----------------
>>>>    2 files changed, 13 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> index 0c3f911e3bf4..045da300749e 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> @@ -546,6 +546,9 @@ struct queue {
>>>>          /* procfs */
>>>>        struct kobject kobj;
>>>> +    struct attribute attr_guid;
>>>> +    struct attribute attr_size;
>>>> +    struct attribute attr_type;
>>>>    };
>>>>      enum KFD_MQD_TYPE {
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index 9158f9754a24..04a5638f9196 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -73,6 +73,8 @@ static void evict_process_worker(struct
>>>> work_struct *work);
>>>>    static void restore_process_worker(struct work_struct *work);
>>>>      static void kfd_process_device_destroy_cwsr_dgpu(struct
>>>> kfd_process_device *pdd);
>>>> +static void kfd_sysfs_create_file(struct kobject *kobj, struct
>>>> attribute *attr,
>>>> +                char *name);
>>>>      struct kfd_procfs_tree {
>>>>        struct kobject *kobj;
>>>> @@ -441,35 +443,12 @@ static ssize_t kfd_sysfs_counters_show(struct
>>>> kobject *kobj,
>>>>        return 0;
>>>>    }
>>>>    -static struct attribute attr_queue_size = {
>>>> -    .name = "size",
>>>> -    .mode = KFD_SYSFS_FILE_MODE
>>>> -};
>>>> -
>>>> -static struct attribute attr_queue_type = {
>>>> -    .name = "type",
>>>> -    .mode = KFD_SYSFS_FILE_MODE
>>>> -};
>>>> -
>>>> -static struct attribute attr_queue_gpuid = {
>>>> -    .name = "gpuid",
>>>> -    .mode = KFD_SYSFS_FILE_MODE
>>>> -};
>>>> -
>>>> -static struct attribute *procfs_queue_attrs[] = {
>>>> -    &attr_queue_size,
>>>> -    &attr_queue_type,
>>>> -    &attr_queue_gpuid,
>>>> -    NULL
>>>> -};
>>>> -
>>>>    static const struct sysfs_ops procfs_queue_ops = {
>>>>        .show = kfd_procfs_queue_show,
>>>>    };
>>>>      static struct kobj_type procfs_queue_type = {
>>>>        .sysfs_ops = &procfs_queue_ops,
>>>> -    .default_attrs = procfs_queue_attrs,
>>>>    };
>>>>      static const struct sysfs_ops procfs_stats_ops = {
>>>> @@ -511,6 +490,10 @@ int kfd_procfs_add_queue(struct queue *q)
>>>>            return ret;
>>>>        }
>>>>    +    kfd_sysfs_create_file(&q->kobj, &q->attr_guid, "guid");
>>>> +    kfd_sysfs_create_file(&q->kobj, &q->attr_size, "size");
>>>> +    kfd_sysfs_create_file(&q->kobj, &q->attr_type, "type");
>>>> +
>>>>        return 0;
>>>>    }
>>>>    @@ -655,6 +638,10 @@ void kfd_procfs_del_queue(struct queue *q)
>>>>        if (!q)
>>>>            return;
>>>>    +    sysfs_remove_file(&q->kobj, &q->attr_guid);
>>>> +    sysfs_remove_file(&q->kobj, &q->attr_size);
>>>> +    sysfs_remove_file(&q->kobj, &q->attr_type);
>>>> +
>>>>        kobject_del(&q->kobj);
>>>>        kobject_put(&q->kobj);
>>>>    }



More information about the amd-gfx mailing list