[Mesa-dev] [PATCH 2/2] mesa/st: pass 4-offset TG4 without lowering if supported

Wed May 7 11:33:33 PDT 2014

On Tue, May 6, 2014 at 1:36 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Tue, May 6, 2014 at 1:29 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 06.05.2014 17:03, schrieb Ilia Mirkin:
>>> On Tue, May 6, 2014 at 10:48 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>> Looks good to me.
>>>
>>> Thanks!
>>>
>>>> Does that mean if also the GATHER_SM5 cap is supported you have to
>>>> support 4 independent, non-constant offsets?
>>>
>>> Not 100% sure what you're asking... but yes, for ARB_gs5 to work, you
>>> have to support independent non-constant offsets. And if you have
>>> PIPE_CAP_TEXTURE_GATHER_OFFSETS enabled, you're making the claim that
>>> you can handle multiple independent offsets in a single texgather.
>>> Without the cap, the 4 offsets get lowered into 4 separate texgathers
>>> (with only one of the returned components used).
>>>
>>> With nvc0, the offsets are passed in via a register, so non-constant
>>> is never an issue. And with nv50, the offsets must be immediates (and
>>> there can be only 1 set of them), but it also has no hope of
>>> supporting all of ARB_gs5.
>>>
>>>> Would it make sense to reorder the caps so the gather stuff is all
>>>> together (now 5 cap bits just for this...)?
>>>
>>> The quantity of caps for texgather is a little ridiculous. I'm of the
>>> opinion that this should be the default behaviour, and it should be up
>>> to the driver to lower it into 4 texgathers if it can't handle them
>>> directly. Furthermore, this functionality is only available (via GL)
>>> with ARB_gs5, which in turn will require a whole bunch of stuff, so I
>>> don't know whether the GATHER_SM5 cap is really that useful. And for
>>> someone with a DX tracker, this functionality would again not be
>>> useful on its own, the rest of SM5 would have to be supported as well
>>> (I assume).
>>>
>>> But that's not what got implemented, and I don't care to modify
>>> radeon, which can only support 1 offset at a time. (Although I don't
>>> think the radeon impl got pushed...) I anticipate that llvmpipe
>>> doesn't care one way or another (perhaps with even a minor preference
>>> towards having it all in one instruction).
>>>
>>> If there's concensus, happy to switch this on by default and get rid
>>> of the cap :) [And also get rid of the GATHER_SM5 cap.]
>> Well I think the point was that there's really hw which can only do
>> simple gather (what d3d10.1 could do or arb_texture_gather would do).
>> This hw will not be able to do other stuff from newer gl versions anyway
>> so it should not be required to support those new features.
>
> Right. But since that hw will only ever expose ARB_texture_gather and
> not ARB_gpu_shader5, it will never receive a TG4 instruciton with
> non-const offsets or multiple offsets. So the cap to indicate that
> non-const or quad offsets are supported isn't really necessary, since
> those will only appear if ARB_gs5 support is claimed, which requires
> more than just the texgather stuff. (The
> PIPE_CAP_TEXTURE_GATHER_COMPONENTS cap _is_ necessary since it
> indicates ARB_texture_gather support, and the value that should be
> returned by some GL query about what tex gather supports.)
>
>> I'm not entirely sure to what it's actually lowered but in any case
>> llvmpipe if it implemented this definitely would want a non-lowered
>> version.
>
> Right now, it'll get lowered to 4 texgathers, with only one of the
> returned 4 components used from each one. (And it can't use texfetch
> since the min/max offsets are different, and there's probably some
> other clever reason as well.)
>
>> I think though some radeon hw could really do SM5 version but
>> not independent offsets natively, though I'm not sure if it would really
>> be all that complicated to handle it in the driver.
>
> Well, I think the claim was that SM5 doesn't actually support the 4
> separate offsets, but GL4 does with textureGatherOffsets(). Also, I
> believe that radeon supports non-const natively, just not have 4
> offsets in one instruction. Same deal with i965 (which is why that
> lowering pass exists in the first place).

Getting back on topic... what should I do? :) Check this in with the
new cap? Or just make it the default behaviour and let drivers that
can't handle it do the lowering in the driver? FWIW, I believe Dave
Airlie was against that, but that might have been because he was
implementing it for r600, which can't handle 4 separate offsets. (BTW,
was that "looks good to me" == R-b?)

  -ilia