[PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Wed May 16 14:25:49 UTC 2018



On 05/16/2018 09:16 AM, Lucas Stach wrote:
> Am Mittwoch, den 16.05.2018, 15:10 +0200 schrieb Christian König:
>> Am 16.05.2018 um 15:00 schrieb Lucas Stach:
>>> Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:
>>>> Am 16.05.2018 um 14:28 schrieb Lucas Stach:
>>>>> Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
>>>>>> Yes, exactly.
>>>>>>
>>>>>> For normal user space command submission we should have tons of
>>>>>> locks
>>>>>> guaranteeing that (e.g. just the VM lock should do).
>>>>>>
>>>>>> For kernel moves we have the mutex for the GTT windows which
>>>>>> protects
>>>>>> it.
>>>>>>
>>>>>> The could be problems with the UVD/VCE queues to cleanup the
>>>>>> handles
>>>>>> when an application crashes.
>>>>> FWIW, etnaviv is currently completely unlocked in this path, but I
>>>>> believe that this isn't an issue as the sched fence seq numbers are
>>>>> per
>>>>> entity. So to actually end up with reversed seqnos one context has
>>>>> to
>>>>> preempt itself to do another submit, while the current one hasn't
>>>>> returned from kernel space, which I believe is a fairly theoretical
>>>>> issue. Is my understanding correct?
>>>> Yes. The problem is with the right timing this can be used to access
>>>> freed up memory.
>>>>
>>>> If you then manage to place a page table in that freed up memory
>>>> taking
>>>> over the system is just a typing exercise.
>>> Thanks. I believe we don't have this problem in etnaviv, as memory
>>> referencing is tied to the job and will only be unreferenced on
>>> free_job, but I'll re-check this when I've got some time.
>> Well that depends on how you use the sequence numbers.
>>
>> If you use them for job completion check somewhere then you certainly
>> have a problem if job A gets the 1 and B the 2, but B completes before A.
> We don't do this anymore. All the etnaviv internals are driven by the
> fence signal callbacks.
>
>> At bare minimum that's still a bug we need to fix because it confuses
>> functions like dma_fence_is_later() and dma_fence_later().
> Agreed. Also amending the documentation to state that
> drm_sched_job_init() and drm_sched_entity_push_job() need to be called
> under a common lock seems like a good idea.

I will respin the patch with added documentation.

Andrey

> Regards,
> Lucas



More information about the amd-gfx mailing list