[Mesa-dev] [PATCH 2/2] mesa/st: pass 4-offset TG4 without lowering if supported
Dave Airlie
airlied at gmail.com
Wed May 7 15:06:07 PDT 2014
On 8 May 2014 04:33, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Tue, May 6, 2014 at 1:36 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Tue, May 6, 2014 at 1:29 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 06.05.2014 17:03, schrieb Ilia Mirkin:
>>>> On Tue, May 6, 2014 at 10:48 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>>> Looks good to me.
>>>>
>>>> Thanks!
>>>>
>>>>> Does that mean if also the GATHER_SM5 cap is supported you have to
>>>>> support 4 independent, non-constant offsets?
>>>>
>>>> Not 100% sure what you're asking... but yes, for ARB_gs5 to work, you
>>>> have to support independent non-constant offsets. And if you have
>>>> PIPE_CAP_TEXTURE_GATHER_OFFSETS enabled, you're making the claim that
>>>> you can handle multiple independent offsets in a single texgather.
>>>> Without the cap, the 4 offsets get lowered into 4 separate texgathers
>>>> (with only one of the returned components used).
>>>>
>>>> With nvc0, the offsets are passed in via a register, so non-constant
>>>> is never an issue. And with nv50, the offsets must be immediates (and
>>>> there can be only 1 set of them), but it also has no hope of
>>>> supporting all of ARB_gs5.
>>>>
>>>>> Would it make sense to reorder the caps so the gather stuff is all
>>>>> together (now 5 cap bits just for this...)?
>>>>
>>>> The quantity of caps for texgather is a little ridiculous. I'm of the
>>>> opinion that this should be the default behaviour, and it should be up
>>>> to the driver to lower it into 4 texgathers if it can't handle them
>>>> directly. Furthermore, this functionality is only available (via GL)
>>>> with ARB_gs5, which in turn will require a whole bunch of stuff, so I
>>>> don't know whether the GATHER_SM5 cap is really that useful. And for
>>>> someone with a DX tracker, this functionality would again not be
>>>> useful on its own, the rest of SM5 would have to be supported as well
>>>> (I assume).
>>>>
>>>> But that's not what got implemented, and I don't care to modify
>>>> radeon, which can only support 1 offset at a time. (Although I don't
>>>> think the radeon impl got pushed...) I anticipate that llvmpipe
>>>> doesn't care one way or another (perhaps with even a minor preference
>>>> towards having it all in one instruction).
>>>>
>>>> If there's concensus, happy to switch this on by default and get rid
>>>> of the cap :) [And also get rid of the GATHER_SM5 cap.]
>>> Well I think the point was that there's really hw which can only do
>>> simple gather (what d3d10.1 could do or arb_texture_gather would do).
>>> This hw will not be able to do other stuff from newer gl versions anyway
>>> so it should not be required to support those new features.
>>
>> Right. But since that hw will only ever expose ARB_texture_gather and
>> not ARB_gpu_shader5, it will never receive a TG4 instruciton with
>> non-const offsets or multiple offsets. So the cap to indicate that
>> non-const or quad offsets are supported isn't really necessary, since
>> those will only appear if ARB_gs5 support is claimed, which requires
>> more than just the texgather stuff. (The
>> PIPE_CAP_TEXTURE_GATHER_COMPONENTS cap _is_ necessary since it
>> indicates ARB_texture_gather support, and the value that should be
>> returned by some GL query about what tex gather supports.)
>>
>>> I'm not entirely sure to what it's actually lowered but in any case
>>> llvmpipe if it implemented this definitely would want a non-lowered
>>> version.
>>
>> Right now, it'll get lowered to 4 texgathers, with only one of the
>> returned 4 components used from each one. (And it can't use texfetch
>> since the min/max offsets are different, and there's probably some
>> other clever reason as well.)
>>
>>> I think though some radeon hw could really do SM5 version but
>>> not independent offsets natively, though I'm not sure if it would really
>>> be all that complicated to handle it in the driver.
>>
>> Well, I think the claim was that SM5 doesn't actually support the 4
>> separate offsets, but GL4 does with textureGatherOffsets(). Also, I
>> believe that radeon supports non-const natively, just not have 4
>> offsets in one instruction. Same deal with i965 (which is why that
>> lowering pass exists in the first place).
>
> Getting back on topic... what should I do? :) Check this in with the
> new cap? Or just make it the default behaviour and let drivers that
> can't handle it do the lowering in the driver? FWIW, I believe Dave
> Airlie was against that, but that might have been because he was
> implementing it for r600, which can't handle 4 separate offsets. (BTW,
> was that "looks good to me" == R-b?)
More information about the mesa-dev
mailing list