[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7
Chia-I Wu
olvaffe at gmail.com
Mon Sep 9 22:37:41 PDT 2013
On Tue, Sep 10, 2013 at 4:01 AM, Ian Romanick <idr at freedesktop.org> wrote:
> On 09/06/2013 05:05 AM, Chia-I Wu wrote:
>> On Thu, Sep 5, 2013 at 9:57 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>> On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes <chrisf at ijw.co.nz> wrote:
>>>> A possible explanation for the perf change is that Xonotic uses
>>>> anisotropic filtering at this quality level. Lowering to txl defeats
>>>> it.
>>> I had a look at that. gl_sampler->MaxAnisotropy is never greater than
>>> 1.0 in gen7_update_sampler_state() so there is no anisotropic
>>> filtering in this case.
>>>
>>> It makes sense to me that avoiding punting to SIMD8 helps the
>>> performance. But it is not clear to me why >10% performance change
>>> can still be observed when INTEL_DEBUG=no16 is specified. A
>>> reasonable explanation is that the image quality is degraded in some
>>> way, which is why I am still nervous about the change.
>> With INTEL_DEBUG=no16 set, the same trick hurts the performance on
>> Haswell by about 5%. That is, sample_d on Haswell is faster than the
>> one emulated with sample_l.
>
> What is the delta if sample_d is used for just SIMD8 shaders on HSW?
> Even when the shader can go SIMD16, some fragments will use the SIMD8 path.
brw_lower_texture_gradients applies on the IR so it is hard to
selectively apply it only for SIMD16 fs. I will see if I can work
something out here to get the numbers you need.
>> But since the trick makes SIMD16 possible, it gains 5% more fps when
>> INTEL_DEBUG=no16 is not set.
>>
>>> An alternative approach to avoid punting seems to emulate SIMD16
>>> sample_d with two SIMD8 sample_d. It will take longer to implement
>>> given my familiarity with the code, and may be less performant. BUt
>>> that would allow things like anisotropic filtering to be honored.
>>>
>>>
>>>> It would be worth doing an image quality comparison before and after the change.
>>> Yeah, that is worth doing. I will do that.
>>>
>>>>
>>>> -- Chris
>>>>
>>>> On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>>>> sample_d is slower than the lowered version on gen7. For gen7, this improves
>>>>> Xonotic benchmark with Ultimate effects by as much as 25%:
>>>>>
>>>>> before the change: 40.06 fps
>>>>> after the change: 51.10 fps
>>>>> after the change with INTEL_DEBUG=no16: 44.46 fps
>>>>>
>>>>> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
>>>>> was from SIMD8 versus SIMD16. If that was the case, we would want to apply
>>>>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
>>>>>
>>>>> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
>>>>> off after the change. Thus textureGrad() is lowered unconditionally for now.
>>>>> Due to this and that I haven't tried it on Haswell, this is still RFC.
>>>>>
>>>>> No piglit regressions.
>>>>>
>>>>> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
>>>>> ---
>>>>> .../dri/i965/brw_lower_texture_gradients.cpp | 54 ++++++++++++++--------
>>>>> 1 file changed, 36 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>> index 1589a20..f3fcb56 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>> @@ -34,8 +34,8 @@ using namespace ir_builder;
>>>>>
>>>>> class lower_texture_grad_visitor : public ir_hierarchical_visitor {
>>>>> public:
>>>>> - lower_texture_grad_visitor(bool has_sample_d_c)
>>>>> - : has_sample_d_c(has_sample_d_c)
>>>>> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
>>>>> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
>>>>> {
>>>>> progress = false;
>>>>> }
>>>>> @@ -44,6 +44,7 @@ public:
>>>>>
>>>>>
>>>>> bool progress;
>>>>> + bool has_sample_d;
>>>>> bool has_sample_d_c;
>>>>>
>>>>> private:
>>>>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
>>>>> ir_visitor_status
>>>>> lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>>> {
>>>>> - /* Only lower textureGrad with shadow samplers */
>>>>> - if (ir->op != ir_txd || !ir->shadow_comparitor)
>>>>> + if (ir->op != ir_txd)
>>>>> return visit_continue;
>>>>>
>>>>> - /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
>>>>> - * message. GLSL provides gradients for the 'r' coordinate. Unfortunately:
>>>>> - *
>>>>> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
>>>>> - * "The r coordinate contains the faceid, and the r gradients are ignored
>>>>> - * by hardware."
>>>>> - *
>>>>> - * We likely need to do a similar treatment for samplerCube and
>>>>> - * samplerCubeArray, but we have insufficient testing for that at the moment.
>>>>> - */
>>>>> - bool need_lowering = !has_sample_d_c ||
>>>>> - ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
>>>>> + bool need_lowering = false;
>>>>> +
>>>>> + if (ir->shadow_comparitor) {
>>>>> + /* Lower textureGrad() with samplerCubeShadow even if we have the
>>>>> + * sample_d_c message. GLSL provides gradients for the 'r' coordinate.
>>>>> + * Unfortunately:
>>>>> + *
>>>>> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
>>>>> + * description: "The r coordinate contains the faceid, and the r
>>>>> + * gradients are ignored by hardware."
>>>>> + */
>>>>> + if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
>>>>> + need_lowering = true;
>>>>> + else if (!has_sample_d_c)
>>>>> + need_lowering = true;
>>>>> + }
>>>>> + else {
>>>>> + /* We likely need to do a similar treatment for samplerCube and
>>>>> + * samplerCubeArray, but we have insufficient testing for that at the
>>>>> + * moment.
>>>>> + */
>>>>> + if (!has_sample_d)
>>>>> + need_lowering = true;
>>>>> + }
>>>>>
>>>>> if (!need_lowering)
>>>>> return visit_continue;
>>>>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>>> expr(ir_unop_sqrt, dot(dPdy, dPdy)));
>>>>> }
>>>>>
>>>>> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. */
>>>>> + /* lambda_base = log2(rho). It will be biased and clamped by values
>>>>> + * defined in SAMPLER_STATE to get the final lambda.
>>>>> + */
>>>>> ir->op = ir_txl;
>>>>> ir->lod_info.lod = expr(ir_unop_log2, rho);
>>>>>
>>>>> @@ -168,8 +182,12 @@ bool
>>>>> brw_lower_texture_gradients(struct brw_context *brw,
>>>>> struct exec_list *instructions)
>>>>> {
>>>>> + /* sample_d is slower than the lowered version on gen7, and is not allowed
>>>>> + * in SIMD16 mode. Treating it as unsupported improves the performance.
>>>>> + */
>>>>> + bool has_sample_d = brw->gen != 7;
>>>>> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
>>>>> - lower_texture_grad_visitor v(has_sample_d_c);
>>>>> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>>>>>
>>>>> visit_list_elements(&v, instructions);
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>> --
>>> olv at LunarG.com
>
--
olv at LunarG.com
More information about the mesa-dev
mailing list