[PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys
Chen, Xiaogang
xiaogang.chen at amd.com
Fri Dec 10 17:08:12 UTC 2021
On 12/10/2021 10:49 AM, Felix Kuehling wrote:
> On 2021-12-10 2:22 a.m., Christian König wrote:
>> Am 09.12.21 um 23:27 schrieb Felix Kuehling:
>>> Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang:
>>>> On 12/9/2021 12:40 PM, Felix Kuehling wrote:
>>>>> Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen:
>>>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>>
>>>>>> When application is about finish it destroys queues it has
>>>>>> created by
>>>>>> an ioctl. Driver deletes queue
>>>>>> entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/)
>>>>>> which is directory including this queue all attributes. Low level
>>>>>> kernel
>>>>>> code deletes all attributes under this directory. The lock from
>>>>>> kernel is
>>>>>> on queue entry, not its attributes. At meantime another user space
>>>>>> application
>>>>>> can read the attributes. There is possibility that the
>>>>>> application can
>>>>>> hold/read the attributes while kernel is deleting the queue entry,
>>>>>> cause
>>>>>> the application have invalid memory access, then killed by kernel.
>>>>>>
>>>>>> Driver changes: explicitly create/destroy each attribute for each
>>>>>> queue,
>>>>>> let kernel put lock on each attribute too.
>>>>> Is this working around a bug in kobject_del? Shouldn't that code take
>>>>> care of the necessary locking itself?
>>>>>
>>>>> Regards,
>>>>> Felix
>>>> The patches do not change kobject/kernfs that are too low level and
>>>> would involve deeper discussions.
>>>> Made changes at higher level(kfd) instead.
>>>>
>>>> Have tested with MSF tool overnight.
>>> OK. I'm OK with your changes. The patch is
>>>
>>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>
>>> But I think we should let the kernfs folks know that there is a problem
>>> anyway. It might save someone else a lot of time and headaches down the
>>> line. Ideally we'd come up with a small reproducer (dummy driver and a
>>> user mode tool (could just be a bash script)) that doesn't require
>>> special AMD hardware and the whole ROCm stack.
>>
>> I think we could do this in the DKMS/release branches, but for
>> upstream we should rather fix the underlying problem.
>
> Sounds reasonable.
>
Some customers(ex, MSF) use upstream based kernel, not our release. The
changes do not affect low level code and should work even after kernfs
got changed.
Changing too low level code would involve more discussions and delay.
>
>>
>> Additional to that this is explicitely what we should not do if I
>> understood Greg correctly in previous discussions, but take that with
>> a grain of salt since I'm not an expert on the topic.
>
> Sorry, I'm not following. What's the thing we should not do:
>
> * make a simple reproducer?
> * work around a bug somewhere else?
>
> And what's the topic you discussed with Greg (KH, I presume)? Kernfs?
> Workarounds? Is there a record of those discussions for reference?
>
> Thanks,
> Felix
>
I am not sure either what the rules should be in this case.
Thanks
Xiaogang
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> Felix
>>>
>>>
>>>> Thanks
>>>> Xiaogang
>>>>
>>>>>> Signed-off-by: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +++
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 33
>>>>>> +++++++-----------------
>>>>>> 2 files changed, 13 insertions(+), 23 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>>> index 0c3f911e3bf4..045da300749e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>>> @@ -546,6 +546,9 @@ struct queue {
>>>>>> /* procfs */
>>>>>> struct kobject kobj;
>>>>>> + struct attribute attr_guid;
>>>>>> + struct attribute attr_size;
>>>>>> + struct attribute attr_type;
>>>>>> };
>>>>>> enum KFD_MQD_TYPE {
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index 9158f9754a24..04a5638f9196 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -73,6 +73,8 @@ static void evict_process_worker(struct
>>>>>> work_struct *work);
>>>>>> static void restore_process_worker(struct work_struct *work);
>>>>>> static void kfd_process_device_destroy_cwsr_dgpu(struct
>>>>>> kfd_process_device *pdd);
>>>>>> +static void kfd_sysfs_create_file(struct kobject *kobj, struct
>>>>>> attribute *attr,
>>>>>> + char *name);
>>>>>> struct kfd_procfs_tree {
>>>>>> struct kobject *kobj;
>>>>>> @@ -441,35 +443,12 @@ static ssize_t kfd_sysfs_counters_show(struct
>>>>>> kobject *kobj,
>>>>>> return 0;
>>>>>> }
>>>>>> -static struct attribute attr_queue_size = {
>>>>>> - .name = "size",
>>>>>> - .mode = KFD_SYSFS_FILE_MODE
>>>>>> -};
>>>>>> -
>>>>>> -static struct attribute attr_queue_type = {
>>>>>> - .name = "type",
>>>>>> - .mode = KFD_SYSFS_FILE_MODE
>>>>>> -};
>>>>>> -
>>>>>> -static struct attribute attr_queue_gpuid = {
>>>>>> - .name = "gpuid",
>>>>>> - .mode = KFD_SYSFS_FILE_MODE
>>>>>> -};
>>>>>> -
>>>>>> -static struct attribute *procfs_queue_attrs[] = {
>>>>>> - &attr_queue_size,
>>>>>> - &attr_queue_type,
>>>>>> - &attr_queue_gpuid,
>>>>>> - NULL
>>>>>> -};
>>>>>> -
>>>>>> static const struct sysfs_ops procfs_queue_ops = {
>>>>>> .show = kfd_procfs_queue_show,
>>>>>> };
>>>>>> static struct kobj_type procfs_queue_type = {
>>>>>> .sysfs_ops = &procfs_queue_ops,
>>>>>> - .default_attrs = procfs_queue_attrs,
>>>>>> };
>>>>>> static const struct sysfs_ops procfs_stats_ops = {
>>>>>> @@ -511,6 +490,10 @@ int kfd_procfs_add_queue(struct queue *q)
>>>>>> return ret;
>>>>>> }
>>>>>> + kfd_sysfs_create_file(&q->kobj, &q->attr_guid, "guid");
>>>>>> + kfd_sysfs_create_file(&q->kobj, &q->attr_size, "size");
>>>>>> + kfd_sysfs_create_file(&q->kobj, &q->attr_type, "type");
>>>>>> +
>>>>>> return 0;
>>>>>> }
>>>>>> @@ -655,6 +638,10 @@ void kfd_procfs_del_queue(struct queue *q)
>>>>>> if (!q)
>>>>>> return;
>>>>>> + sysfs_remove_file(&q->kobj, &q->attr_guid);
>>>>>> + sysfs_remove_file(&q->kobj, &q->attr_size);
>>>>>> + sysfs_remove_file(&q->kobj, &q->attr_type);
>>>>>> +
>>>>>> kobject_del(&q->kobj);
>>>>>> kobject_put(&q->kobj);
>>>>>> }
>>
More information about the amd-gfx
mailing list