[Mesa-dev] [PATCH 2/3] gallium: add texture gather support to gallium (v3)
Dave Airlie
airlied at gmail.com
Tue Feb 11 13:58:14 PST 2014
>> dst.z = texture_depth(unit, lod)
>>
>> +.. opcode:: TG4 - Texture Gather (as per ARB_texture_gather)
>> + Gathers the four texels to be used in a bi-linear
>> + filtering operation and packs them into a single register.
>> + Only works with 2D, 2D array, cubemaps, and cubemaps arrays.
>> + For 2D textures, only the addressing modes of the sampler and
>> + the top level of any mip pyramid are used. Set W to zero.
>> + It behaves like the TEX instruction, but a filtered
>> + sample is not generated. The four samples that contribute
>> + to filtering are placed into xyzw in clockwise order,
>> + starting with the (u,v) texture coordinate delta at the
>> + following locations (-, +), (+, +), (+, -), (-, -), where
>> + the magnitude of the deltas are half a texel.
>> +
>> + PIPE_CAP_TEXTURE_SM5 enhances this instruction to support
>> + shadow per-sample depth compares, single component selection,
>> + and a non-constant offset. It doesn't allow support for the
>> + GL independent offset to get i0,j0. This would require another
>> + CAP is hw can do it natively. For now we lower that before
>> + TGSI.
>> +
>> +.. math::
>> +
>> + coord = src0
>> +
>> + component = src1
>> +
>> + dst = texture_gather4 (unit, coord, component)
>> +
>> +(with SM5 - cube array shadow)
>> +
>> + coord = src0
>> +
>> + compare = src1
>> +
>> + dst = texture_gather (uint, coord, compare)
>> +
> So how does component selection work with the latter version?
> I think it would be nice if you wouldn't really need two versions (so if
> you don't support comparisons, the src would just be unused).
That's docs not being clear enough if you read it like that. The
second version is only for cube array shadow compares, which have no
components. The first version is the same for non-shadow compares.
> Also, FWIW for llvmpipe you'd probably wanted a native 4 offsets
> versions, I don't think llvm could eliminate the huge amount of
> duplicated code completely if you generate 4 texture lookups. Of course,
> someone would need to implement it first (shouldn't be too difficult).
Yeah llvmpipe might be in the category for using the extra CAP, I'm
really hoping nvidia hw does do this, but the interface is kinda
arbitrary and maybe we should consider another opcode,
Since we have for SM5 nonconstant ones something like,
TG4 TEMP[1], TEMP[1], SAMP[0] , TEMP[2].xyz
which will sample around temp[1] i0,j0 - i1, j1 at the offset in temp[2]
and
TG4 TEMP[1], TEMP[1], SAMP[0], TEMP[2].xyz, TEMP[3].xyz, TEMP[4].xyz,
TEMP[5].xyz
which will sample i0,j0 from TEMP[1] and the respective offsets.
Dave.
More information about the mesa-dev
mailing list