[PATCH 1/3] drm/radeon: move ring locking out of reset path

Jerome Glisse j.glisse at gmail.com
Mon Jul 2 09:01:30 PDT 2012


On Mon, Jul 2, 2012 at 11:58 AM, Christian König
<deathsimple at vodafone.de> wrote:
> On 02.07.2012 17:41, Jerome Glisse wrote:
>>
>> On Fri, Jun 29, 2012 at 12:15 PM, Michel Dänzer <michel at daenzer.net>
>> wrote:
>>>
>>> On Fre, 2012-06-29 at 17:18 +0200, Christian König wrote:
>>>>
>>>> On 29.06.2012 17:09, Michel Dänzer wrote:
>>>>>
>>>>> On Fre, 2012-06-29 at 16:45 +0200, Christian König wrote:
>>>>>>
>>>>>> Hold the ring lock the whole time the reset is in progress,
>>>>>> otherwise another process can submit new jobs.
>>>>>
>>>>> Sounds good, but doesn't this create other paths (e.g. initialization,
>>>>> resume) where the ring is being accessed without holding the lock?
>>>>> Isn't
>>>>> that a problem?
>>>>
>>>> Thought about that also.
>>>>
>>>> For init I'm pretty sure that no application can submit commands before
>>>> we are done, otherwise we are doomed anyway.
>>>>
>>>> For resume I'm not really sure, but I think that applications are
>>>> resumed after the hardware driver had a chance of doing so.
>>>
>>> I hope you're right... but if it's not too much trouble, it might be
>>> better to be safe than sorry and take the lock for those paths as well.
>>>
>>>
>> NAK this is the wrong way to solve the issue, we need a global lock on
>> all path that can trigger gpu activities. Previously it was the cs
>> mutex, but i haven't thought about it too much when it got removed. So
>> to fix the situation i am sending a patch with rw semaphore.
>
> So what I'm missing? What else can trigger GPU activity when not the rings?
>
> I'm currently working on ring-partial resets and also resets where you only
> skip over a single faulty IB instead of flushing the whole ring. And my
> current idea for that to work is that we hold the ring lock while we do
> suspend, ring_save, asic_reset, resume and ring_restore.
>
> Christian.
>

I just sent a patch, the idea is that you want gpu reset to be an
exclusive operation like gpu init, or gpu resume. So by taking rw
semaphore you allow the gpu reset to be exclusive and so you know
nobody can trigger gpu activies while still allowing concurrency in
case no gpu reset is on going.

Cheers,
Jerome


More information about the dri-devel mailing list