[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Ilia Mirkin
imirkin at alum.mit.edu
Fri Feb 19 19:43:04 UTC 2016
On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> Hi,
>
> On 18-02-16 17:39, Ilia Mirkin wrote:
>>
>> On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede <hdegoede at redhat.com>
>> wrote:
>>>
>>> But this does not seem to be hooked up yet for nouveau.
>>
>>
>> Samuel has patches. See
>> https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3
>
>
> Cool, I will take a look at those.
>
>>> So some questions:
>>> -The commit by Samual says:
>>> This introduces TGSI_FILE_MEMORY for shared, global and local memory.
>>> Only
>>> shared memory is currently supported.
>>>
>>> The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it
>>> also
>>> introduces
>>> a second option next to shared, so what are we going to use plain
>>> MEMORY[x]
>>> for?
>>> I suggest using it for global memory but we need to be in agreement on
>>> this.
>>
>>
>> That sounds fine to me. However what I had in mind was switching the
>> SHARED field into a 2-bit field and making it
>>
>> 1 = SHARED
>> 2 = GLOBAL
>> 3 = LOCAL
>>
>> (since for OpenCL you also need to be able to address local or private
>> memory). I sorta wanted Samuel to do it, but since I had no idea where
>> you were at, or if you were even still working on this, I figured it
>> should be fixed up by the first person who needed it.
>
>
> Sounds good, only the naming is somewhat unfortunate since opencl uses
> different
> naming. I.e. it has no "shared"
Sad. Well, "shared" is what OpenGL compute shaders use, which is why I
proposed it.
>
> OpenCL has:
> -global: accessible by all worker-groups as well as by the CPU
> -const: read-only global
> -local: shared by worker-items in the same worker-group, not shared
> between worker-groups
> -private: accessible only to a single worker-item
>
> So how do these map to the TGSI:
>
>> 1 = SHARED
>> 2 = GLOBAL
>> 3 = LOCAL
OpenCL global = TGSI global
OpenCL const = TGSI global
OpenCL local = TGSI shared
OpenCL private = TGSI local
Not sure what the distinction is between OpenCL const and global is.
If the const stuff is actually just user-supplied uniforms (and
doesn't need to be in a particular place in memory), then those should
go into TGSI CONST somehow.
>>
>>>
>>> -What about kernel input parameters, so far these have been using
>>> RES[32764]
>>> I must admit that I do not understand where the file_index of 32764
>>> comes
>>> from (or where any of the file indexes come from for
>>> src/gallium/tests/trivial/compute.c ?
>>> I have the feeling that these are not used at all, and everything
>>> simply
>>> goes
>>> to a flat (virtual) memory space, with the params at address 0, correct
>>> ?
>>
>>
>> It was never particularly well-specified, which was one of the reasons
>> it went away. It also didn't map nicely onto the OpenGL model. There
>> is a remaining question of how to do addressing in memory... there's
>> 40 bits of address space. Should these implicitly be U64
>> (dual-component in TGSI) addresses that are passed around? Not sure
>> what the OpenCL position on all this is.
>
>
> So far I've been using U32 for addresses as that is what Francisco's
> original
> code was using. And this also is what things like the tgsi LOAD instruction
> take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to
> specify the index, and the way how this currently works with OpenCL is that
> clCreateBuffer() will return a cl_mem type which then gets passed into
> the kernel as input parameter and gets treated as a pointer by the compiler,
> so e.g. global mem gets treated as a single address space even if there
> are multiple global buffers and TEMP[#].x contains the value passed in
> via cl_mem as start offset for the buffer + the index into the buffer.
>
> So this means that currently we are limited to U32 since TEMP[#].x is
> only 32 bits wide. Internally 40 bits addresses can and should probably
> be used so that at least the different memory spaces each have the full
> 32 bits available.
>
> Note that we could fix this by adding some sort of LOAD64 opcode, which
> uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
> how this would work for 3d buffers though. I foresee the llvm backend
> eventually getting a 64 bit mode where it will use 64 bits for all
> pointers and use something like a LOAD64 opcode to indicate that
> the indexes (which it effectively uses as addresses / pointers)
> are 64 bit wide.
Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
src/dest, so you don't need to worry about 3d or anything like that. I
believe this is a good solution to the problem.
>
>> Also it would be highly preferable to avoid using GLOBAL at all in the
>> first place, and just use BUFFER[] or IMAGE[] or whatever. However
>> after having a look at how OpenCL works, I don't think that's
>> possible. But perhaps I missed something?
>
>
> OpenCL is very C-like and as such uses a flat address space per memory
> space, getting around that is going to be very tricky, if possible at
> all.
That was my assessment as well.
>
>> Perhaps I didn't make it clear enough though -- sorry. The reality is
>> that TGSI is a living thing, and things change every so often. I don't
>> think people would be comfortable with locking down TGSI in such a
>> way.
>
>
> Hmm, I do think that if we end up using a llvm-ir -> tgsi step in some
> cases that we do need to lock TGSI down. Taken the RES thing for example,
> it would have been pretty trivial actually to not nuke it, and instead just
> add the image support next to it. Note I'm fine with making changes while
> we are figuring this out, but once I start pushing the llvm bits to llvm
> upstream (which is still months away) we are going to need some sort of
> TGSI stability.
I can't unilaterally make such promises. This would have to be
discussed and approved by the other gallium/tgsi stakeholders.
-ilia
More information about the Nouveau
mailing list