[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Ilia Mirkin
imirkin at alum.mit.edu
Mon Feb 22 16:59:04 UTC 2016
On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> Hi,
>
>
> On 22-02-16 17:13, Ilia Mirkin wrote:
>>
>> On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin <imirkin at alum.mit.edu>
>> wrote:
>>>
>>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at redhat.com>
>>> wrote:
>>>>>
>>>>> But assuming I'm right, what I'm proposing is that instead of passing
>>>>> the input in as a global buffer, to instead pass it in as a const
>>>>> buffer. As such instead of sticking it into ->set_global_binding,
>>>>> you'd stick it into ->set_constant_buffer, and then you'll be able to
>>>>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>>>>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
>>>>> it's 0.) You don't even have to load these, you can use them as args
>>>>> directly anywhere you like (except as indirect addresses).
>>>>>
>>>>> The old code would actually take the supplied inputs, stick them into
>>>>> a constbuf, and then lower RINPUT accesses to load from that constbuf.
>>>>> I'm suggesting we cut out the middleman.
>>>>>
>>>>> By the way, another term for "constant buffer" is "uniform buffer", on
>>>>> the off chance it helps. Basically it's super-cached by the shader for
>>>>> values that never change across shader invocations. [And there's
>>>>> special stuff in the hw to allow running multiple sets of shader
>>>>> invocations with different "constant" values... or so we think.]
>>>>
>>>>
>>>>
>>>> I'm fine with using constant buffers for the input, it is not the
>>>> mechanism I'm worried about it is the tgsi syntax to express things,
>>>> I think it would be beneficial for the tgsi syntax to be abstract, and
>>>> not worry about the underlying mechanism, this will i.e. allow us
>>>> to use shared memory for input on tesla and const bufs on later
>>>> generations
>>>> without the part generating the tgsi code needing to worry about this.
>>>
>>>
>>> Yeah, I think you're right. I didn't realize that tesla had a special
>>> form of input for user params, I assumed it was just the usual thing.
>>> So forget about constbufs, go with the INPUT thing. Which is great,
>>> since we had one value left over in that (future) 2-bit field :)
>>>
>>>>
>>>> ###
>>>>
>>>> Somewhat unrelated to the input problem, I'm also somewhat worried
>>>> about the addressing method for MEMORY type registers.
>>>>
>>>> Looking at the old RES stuff then the "index" passed into say a LOAD
>>>> was not as much an index as it was simply a 32 bit GPU virtual memory
>>>> address, which fits well with the OpenCL ways of doing things (the
>>>> register number as in the 55 in RES[55] was more or less ignored).
>>>>
>>>> Where as, e.g. the new BUFFER style "registers" the index really
>>>> is an index, e.g. doing:
>>>> LOAD TEMP[0].x, BUFFER[0], IMM[0]
>>>> resp.
>>>> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>>>>
>>>> Will read from a different memory address, correct ?
>>>
>>>
>>> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
>>> BUFFER[1] refers to the buffer at binding point 1. They might, in
>>> fact, overlap, or even be the same buffer. But the code doesn't know
>>> about that.
>
>
> Ack.
>
>>>> So how will this work for MEMORY type registers ? For OpenCL having the
>>>> 1-dimensional behavior of RES really is quite useful, and having the
>>>> address be composed of a hidden base address which gets determined under
>>>> the hood from the register number, and then adding an index on top of
>>>> it does not fit so well.
>>>
>>>
>>> Not sure what the question is... you have code like
>>>
>>> int *foo = [pointer value from user input];
>>> *foo = *(foo + 5);
>>>
>>> right?
>>>
>>> So that'd just become
>>>
>>> MOV TEMP[0].x, <val from user input, whereever it is>
>>> ADD TEMP[0].y, TEMP[0].x, 5 * 4
>>> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
>>> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
>>>
>>> or perhaps I'm misunderstanding something?
>>>
>>> MEMORY, GLOBAL == the global virtual memory address space, not some
>>> specific buffer. Trying to load address 0 from it will likely lead to
>>> sadness, unless you happen to have something mapped there. BUFFER has
>>> an implied base address, based on the binding point, but MEMORY has no
>>> such thing.
>
>
> OK, that answers my questions / worries, I was worried that MEMORY
> too would have an implied base address, which would more or less only
> get in the way with opencl, but if the memory register file takes
> a virtual memory address as second operand to LOAD then I'm happy.
>
> So I guess that if we mix in say TGSI-shared / OpenCL-local memory
> them I would do:
>
> DCL MEMORY[0], GLOBAL
> DCL MEMORY[1], SHARED
>
> And then to load something from global mem at offset TEMP[0].y:
>
> LOAD TEMP[0].x, MEMORY[0], TEMP[0].yyyy
>
> And to load something from the shared mem at offset TEMP[0].y:
>
> LOAD TEMP[0].x, MEMORY[1], TEMP[0].yyyy
>
> Correct ? And the shared mem to will take shared virtual memory
> addresses, just like global takes global virtual memory
> addresses ?
That's how I see it. You may have to add LOAD64/STORE64 for 64-bit
addresses though. Or we could decree that all addressing on global
memory shall be 64-bit (and thus read the .xy components of the
address source).
>
>> Another way of looking at it is that instead of having the hacky
>> RES[12345] being hardcoded to mean something special, you now have a
>> dedicated file called 'MEMORY', which has identical semantics.
>
>
> I'm all for getting rid of the RES[12345] hack :)
>
> I guess where you write "you now have a dedicated file called 'MEMORY'"
> You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
> and LOCAL at least, and probably as discussed one for INPUT ?
>
> This all sounds good to me, as said my worry was that MEMORY would have
> an implied base address like BUFFER has, now that you've
> made clear that MEMORY does not have this I'm happy :)
There's a bit of a wrinkle here, and it's questionable whether we want
to allow for this somehow, but... Tesla actually has no way to address
global memory. It's always done with a base offset (which can be set
to 0). The trick is that it can only address 32 bits at a time,
there's no 64-bit addressing. But it has *16* such "global" memory
spaces, i.e. which are each base + up to 32-bit offset [and ultimately
only 40 bits of addressability]. I don't know if OpenCL provides
something good for that, if it does we can use semantic indices on the
GLOBAL to make it like
DCL MEMORY[0], GLOBAL[0]
DCL MEMORY[1], GLOBAL[1]
etc. But again, this is pretty optional.
-ilia
More information about the Nouveau
mailing list