Include request for reset-rework branch.

Christian König deathsimple at vodafone.de
Tue May 1 06:38:07 PDT 2012


On 30.04.2012 18:26, Jerome Glisse wrote:
> On Mon, Apr 30, 2012 at 11:37 AM, Christian König
> <deathsimple at vodafone.de>  wrote:
>> On 30.04.2012 17:12, Jerome Glisse wrote:
>>> On Mon, Apr 30, 2012 at 11:12 AM, Jerome Glisse<j.glisse at gmail.com>
>>>   wrote:
>>>> On Mon, Apr 30, 2012 at 10:50 AM, Christian König
>>>> <deathsimple at vodafone.de>    wrote:
>>>>> Hi Dave,
>>>>>
>>>>> if nobody has a last moment concern please include the following patches
>>>>> in drm-next.
>>>>>
>>>>> Except for some minor fixes they have already been on the list for quite
>>>>> some time,
>>>>> but I intentional left out the debugfs related patches cause we haven't
>>>>> finished the
>>>>> discussion about them yet.
>>>>>
>>>>> If you prefer to merge them directly, I also made them available as
>>>>> reset-rework
>>>>> branch here: git://people.freedesktop.org/~deathsimple/linux
>>>>>
>>>>> Cheers,
>>>>> Christian.
>>>>>
>>>> I am not completely ok, i am against patch 5. I need sometime to review
>>>> it.
>>>>
>>>> Cheers,
>>>> Jerome
>>> Sorr mean patch 7
>> I just started to wonder :) what's wrong with patch 7?
>>
>> Please keep in mind that implementing proper locking in the lower level
>> objects allows us to remove quite some locking in the upper layers.
>>
>> By the way, do you mind when I dig into the whole locking stuff of the
>> kernel driver a bit more? There seems to be allot of possibilities to
>> cleanup and simplify the overall driver.
>>
>> Cheers,
>> Christian.
> Well when it comes to lock, i have some kind of an idea i wanted to
> work for a while. GPU is transaction based in a sense, we feed rings
> and it works on them, 90% of the work is cs ioctl so we should be able
> to have one and only one lock (ignoring the whole modesetting path
> here for which i believe one lock too is enough). Things like PM would
> need to take all lock but hey i believe that's a good thing as my
> understanding is PM we do right now need most of the GPU idle.
>
> So here is what we have
> ih spinlock (can be ignored ie left as is)
> irq spinlock (can be ignored ie left as is)
> blit mutex
> pm mutex
> cs mutex
> dc_hw_i2c mutex
> vram mutex
> ib mutex
> rings[] lock
>
> So the real issue is ttm calling back into the driver. So the idea i
> had is to have a work thread that is the only one allowed to mess with
> the GPU. The work thread would use some locking in which it has
> preference over the writer (writer being cs ioctl or ttm call back or
> anything else that needs to schedule GPU work). This would require
> only two lock. I am actually thinking of having 2 list one where
> writer add there "transaction" and one where the worker empty the
> transaction, something like:
>
> cs_ioct
> {
> sa_alloc_for_trans
> ....
> lock(trans_lock)
> list_add_tail(trans, trans_temp_list)
> unlock(trans_lock)
> }
>
> workeur
> {
> lock(trans_lock)
> list_splice_tail(trans_list, trans_temp_list)
> unlock(trans_lock)
> // schedule ib
> ....
> // process fence&  semaphore
> }
>
> So there would be one transaction lock and one lock for the
> transaction memory allocation (ib, semaphore and alike) The workeur
> would also be responsible for GPU reset, there wouldn't be any ring
> lock as i believe one worker is enough for all rings.
>
> For transaction the only issue really is ttm, cs is easy it's an
> ib+fence+semaphore, ttm can be more complex, it can be bo move, bind,
> unbind, ... Anyway it's doable and it's design i had in mind for a
> while.
Well, that sounds like the direction to go, but I would like to even 
avoid the submission thread, since the GPU is working on the rings 
asynchronously anyway.

The locking problems we are currently seeing are more a result of 
abusing global variables for local data. or locking to much code with to 
few primitives, instead of just having to much locking primitives over 
all. For example, I really can't see why we have a blit mutex for the 
r600 shader blit code? Also retrospective having one mutex per ring does 
sound a bit superfluous.

To sum my ideas up:

1. I suggest that memory management is self containing, e.g. you can 
request small amounts of memory for IBs, fences, semaphores, blitting 
vertex buffer etc.. without worrying about others doing that at the same 
time as you. That really sounds like your "lock for the transaction 
memory allocation", and I'm pretty sure that I've extended the SA so far 
to play that role pretty well.

2. Have exactly ONE ring submission mutex. This mutex is taken right 
before an job (and it shouldn't matter if that's a ttm move/blit or an 
IB) is pushed unto a ring. And it is strictly not allowed to allocate 
more SA memory, call into TTM etc.. while this mutex is held. Everything 
that's necessary to submit a job must happen before grabbing it and 
stored in thread local memory, e.g. on the stack or a kmalloc allocated 
patch of memory.

3. Protect data, not code! I.e. have locks to protect one data 
structures and not just say: Those two things shouldn't happen at the 
same time acquire that and this lock.

Over all it sounds like we are playing with similar ideas, I would 
suggest that I just hack together some patches, probably starting with 
the ring submission and r600 blit mutex and then we take a look again 
where this leads us.

Cheers,
Christian.


More information about the dri-devel mailing list