[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Ilia Mirkin imirkin at alum.mit.edu
Tue Feb 23 20:36:56 UTC 2016


On Tue, Feb 23, 2016 at 7:12 AM, Pierre Moreau <pierre.morrow at free.fr> wrote:
> On 11:43 AM - Feb 23 2016, Hans de Goede wrote:
>> Hi,
>>
>
> [snip]
>
>>
>> >You may have to add LOAD64/STORE64 for 64-bit
>> >addresses though. Or we could decree that all addressing on global
>> >memory shall be 64-bit (and thus read the .xy components of the
>> >address source).
>>
>> I would prefer to keep LOAD / STORE semantics the same as with
>> other LOAD / STORE -s to / from 1d buffers.
>>
>> I think that in the end the tgsi backend for llvm will get both
>> a 32 bit and a 64 bit mode, like the nvptx backend already has.
>>
>> And then the 64 bit backend will use a new LOAD64 / STORE64
>> also do not forget that keeping 64 bit pointers takes twice as
>> much registers, so 32 bit will likely be optimal in a lot of
>> cases. I guess since OpenCL does not give the user a way
>> to select which mode to use we will end up with some sort
>> of heuristic based on the amount of memory on the card or
>> some such.
>>
>> After all using 64 bit pointers does not make a lot of sense
>> on a card with only 1 GB of RAM (yes I know we're talking virtual
>> address space here).
>>
>> Anyways this all really is too soon to tell. Maybe the performance
>> impact of using 64 bit pointers is negligible. But I think it would
>> be good (and consistent) to keep LOAD / STORE taking 32 bit addresses
>> even for MEMORY and add a LOAD64 / STORE64 when I get around to
>> implementing a 64 bit mode for the llvm tgsi backend (or when others
>> need them).
>>
>> >>>Another way of looking at it is that instead of having the hacky
>> >>>RES[12345] being hardcoded to mean something special, you now have a
>> >>>dedicated file called 'MEMORY', which has identical semantics.
>> >>
>> >>
>> >>I'm all for getting rid of the RES[12345] hack :)
>> >>
>> >>I guess where you write "you now have a dedicated file called 'MEMORY'"
>> >>You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
>> >>and LOCAL at least, and probably as discussed one for INPUT ?
>> >>
>> >>This all sounds good to me, as said my worry was that MEMORY would have
>> >>an implied base address like BUFFER has, now that you've
>> >>made clear that MEMORY does not have this I'm happy :)
>> >
>> >There's a bit of a wrinkle here, and it's questionable whether we want
>> >to allow for this somehow, but... Tesla actually has no way to address
>> >global memory. It's always done with a base offset (which can be set
>> >to 0). The trick is that it can only address 32 bits at a time,
>> >there's no 64-bit addressing. But it has *16* such "global" memory
>> >spaces, i.e. which are each base + up to 32-bit offset [and ultimately
>> >only 40 bits of addressability]. I don't know if OpenCL provides
>> >something good for that, if it does we can use semantic indices on the
>> >GLOBAL to make it like
>> >
>> >DCL MEMORY[0], GLOBAL[0]
>> >DCL MEMORY[1], GLOBAL[1]
>> >
>> >etc. But again, this is pretty optional.
>>
>> I think that for Tesla we can just only support the tgsi32 target
>> and not the tgsi64 target, at least that is how I envision things
>> today, who knows what tomorrow will bring :)
>
> Fermi is the first family supporting 64 bit addresses, on top of 32 bit
> addresses, while Kepler can only do 64 bit IIRC (or is it Maxwell which dropped
> the 32 bit support?).

I've only ever seen the blob do 64-bit addressing in OpenGL shaders on
Fermi+. However Hans assures me that 32-bit addressing actually works
fine, presumably the upper 8 bits are set to 0 or some constant
supplied via some register somewhere. All of the SM20+ ISA's have a
separate "wide address" bit in their gmem-using instructions. (aka "E"
in nvdisasm)

Anyways, the only reason to add LOAD64 is not to muddy LOAD semantics.
LOAD should remain 32-bit (single-component), while LOAD64 would be
defined only for 1d entities and combine the first 2 components of the
address source.

I think Hans and I are in agreement on most points now... the only
issue is that we don't really control where in VA space buffers are
allocated, so you even if you have 1MB of vram, it might end up being
somewhere in far-off 40-bit land. Ben has promised to provide such
control in the future, but who knows when that work will land.

  -ilia


More information about the Nouveau mailing list