[Mesa-dev] [PATCH] radeonsi: add cs tracing v2

Tue Mar 26 10:44:22 PDT 2013

Am 26.03.2013 18:02, schrieb Jerome Glisse:
> On Tue, Mar 26, 2013 at 12:40 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Tue, Mar 26, 2013 at 3:59 PM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>> Am 26.03.2013 15:34, schrieb Marek Olšák:
>>>
>>>> Speaking of si_pm4_state, I think it's a horrible mechanism for
>>>> anything other than constant state objects (create/bind/delete
>>>> functions). For everything else (set/draw functions), you want to emit
>>>> directly into the command stream. It's not so different from the bad
>>>> state management which r600g used to have (which is now gone). If you
>>>> have to call malloc or calloc in a set_* or draw_* function, you're
>>>> doing it wrong. Are there plans to change it to something more
>>>> efficient (e.g. how r300g and r600g emit non-CSO states right now), or
>>>> will it be like this forever?
>>>
>>> Actually I hoped that r600g sooner or later moves into the same direction
>>> some more. The fact that we currently need to malloc every buffer indeed
>>> sucks badly, but that is still better than mixing packet generation with
>>> driver logic.
>> I don't understand the last sentence. What mixing? The set_* and
>> draw_* commands are supposed to be executed immediately, therefore
>> it's reasonable and preferable to write to the CS directly. Having any
>> intermediate storage for commands is a waste of time and space.
> I agree here, i don't think uncached bo for command stream on new hw
> would bring huge perf increase, probably will just be noise.
>
>>> Also I don't think that emitting directly into the command stream is such a
>>> good idea, we sooner or later want that buffer to be a buffer allocated in
>>> GART memory. And under this condition it is better to build up the commands
>>> in a (heavily cached) system memory and then memcpy then to the destination
>>> buffer.
>> AFAIK, GART memory is cached on non-AGP systems, but even uncached
>> access shouldn't be a big issue, because the access pattern is
>> sequential and write-only. BTW, I have talked about emitting commands
>> into a buffer object with Dave and he thinks it's a bad idea due to
>> the map and unmap overhead. Also, we have to disallow writing to
>> certain unsafe registers anyway.
>>
>> Marek
> I think Christian is thinking about new hw > cayman where we can skip
> register checking because of vm and hardware register checking (the hw
> CP checks that register in the user IB is not one of the privilege
> register and block write and throw irq if so). On this kind of hw you
> can have cmd stream in bo and don't do the map/unmap.

Yes indeed, and my plan is to avoid the copying by referencing the state 
directly with indirect buffer commands. That should also make thinks 
like queries and predicated rendering a bit more simpler (think of PM4 
subroutine calls).

The problem on SI is that for embedded data and const IBs you need to 
patch up the buffer quite a bit after it is written (at least if I 
understand them correctly). But Marek is quite right that this only 
counts for state objects and makes no sense for set_* and draw_* calls 
(and I'm currently thinking how to avoid that and can't come up with a 
proper solution). Anyway it's definitely not an urgent problem for radeonsi.

I still think that writing into the command buffers directly (e.g. 
without wrapper functions) is a bad idea, cause that lead to mixing 
driver logic and packet building in r600g. For example just try to 
figure out how the relocation in NOPs work by reading the source (please 
keep in mind that one of the primary goals why AMD is supporting this 
driver is to give a good example code for customers who want to 
implement that stuff on their own systems).

Christian.

> Cheers,
> Jerome
>