[Mesa-dev] [PATCH] r600g: add htile support v8
Marek Olšák
maraeo at gmail.com
Sat Jul 14 14:23:38 PDT 2012
I have found yet another bug in your patch:
Whenever you add register writes into the emit functions, you must
also update the maximum number of dwords the state can emit. It's the
second last parameter of r600_init_atom.
Marek
On Sat, Jul 14, 2012 at 2:20 AM, Jerome Glisse <j.glisse at gmail.com> wrote:
> On Fri, Jul 13, 2012 at 8:02 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> Hi Jerome,
>>
>> I have a lot of remarks.
>>
>> 1) The DB decompression fix (where you update DB_RENDER_CONTROL) could
>> be in a separate patch.
>>
>> 2) The fix with EARLY_Z_THEN_LATE_Z (in update_dual_export) could also
>> be in a separate patch.
>
>> 3) r600_context::use_hyperz is set to FALSE by default, why? If the
>> fast clear works and there are no piglit regressions, then please
>> enable it. If the fast clear doesn't work reliably, please put it in a
>> branch. Merging non-working code on purpose is a very bad idea.
>
> Again it does work, but it has a non null probability to lockup gpu,
> probability depends on the gpu and application, but it can be quite
> high. So again code works but given it leads to lockup i dont want it
> to enable it by default. And I do intend to merge this.
>
>
>> 4) The depth clear value should be per-layer in addition to being
>> per-level. Otherwise, cubemaps and texture arrays would be broken.
>
> Yeah overlook that case
>
>> 5) You added r600_db_misc_state::resummarize, but it's unused. Please
>> pay attention to detail.
>
> It's left over.
>
>> Also there are additional comments inline below.
>>
>> On Fri, Jul 13, 2012 at 8:30 PM, <j.glisse at gmail.com> wrote:
>>> From: Jerome Glisse <jglisse at redhat.com>
>>>
>>> htile is used for HiZ and HiS support and fast Z/S clears.
>>> This commit just adds the htile setup and Fast Z clear.
>>> We don't take full advantage of HiS with that patch.
>>>
>>> v2 really use fast clear, still random issue with some tiles
>>> need to try more flush combination, fix depth/stencil
>>> texture decompression
>>> v3 fix random issue on r6xx/r7xx
>>> v4 rebase on top of lastest mesa, disable CB export when clearing
>>> htile surface to avoid wasting bandwidth
>>> v5 resummarize htile surface when uploading z value. Fix z/stencil
>>> decompression, the custom blitter with custom dsa is no longer
>>> needed.
>>> v6 Reorganize render control/override update mecanism, fixing more
>>> issues in the process.
>>> v7 Add nop after depth surface base update to work around some htile
>>> flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
>>> have issue. Do not enable hyperz when flushing/uncompressing
>>> depth buffer.
>>> v8 Fix htile surface, preload and prefetch setup. Only set preload
>>> and prefetch on htile surface clear like fglrx. Record depth
>>> clear value per level. Support several level for the htile
>>> surface. First depth clear can't be a fast clear.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux at gmail.com>
>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>> Signed-off-by: Jerome Glisse <jglisse at redhat.com>
>>> ---
>>> src/gallium/drivers/r600/evergreen_hw_context.c | 8 +-
>>> src/gallium/drivers/r600/evergreen_state.c | 97 ++++++++++++-----
>>> src/gallium/drivers/r600/evergreend.h | 4 +
>>> src/gallium/drivers/r600/r600_blit.c | 37 +++++--
>>> src/gallium/drivers/r600/r600_hw_context.c | 25 +++++
>>> src/gallium/drivers/r600/r600_pipe.c | 1 +
>>> src/gallium/drivers/r600/r600_pipe.h | 13 ++-
>>> src/gallium/drivers/r600/r600_resource.h | 7 ++
>>> src/gallium/drivers/r600/r600_state.c | 133 ++++++++++++++++++++---
>>> src/gallium/drivers/r600/r600_state_common.c | 6 +
>>> src/gallium/drivers/r600/r600_texture.c | 98 +++++++++++++++++
>>> src/gallium/drivers/r600/r600d.h | 6 +
>>> 12 files changed, 376 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c
>>> index 53d4582..546c884 100644
>>> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
>>> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
>>> @@ -43,7 +43,6 @@ static const struct r600_reg evergreen_ctl_const_list[] = {
>>> };
>>>
>>> static const struct r600_reg evergreen_context_reg_list[] = {
>>> - {R_028000_DB_RENDER_CONTROL, 0, 0},
>>> {R_028008_DB_DEPTH_VIEW, 0, 0},
>>> {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> @@ -63,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] = {
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> {R_028058_DB_DEPTH_SIZE, 0, 0},
>>> {R_02805C_DB_DEPTH_SLICE, 0, 0},
>>> + {R_02802C_DB_DEPTH_CLEAR, 0, 0},
>>> + {R_028ABC_DB_HTILE_SURFACE, 0, 0},
>>> + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
>>> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
>>> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
>>> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
>>> @@ -301,7 +303,6 @@ static const struct r600_reg evergreen_context_reg_list[] = {
>>> };
>>>
>>> static const struct r600_reg cayman_context_reg_list[] = {
>>> - {R_028000_DB_RENDER_CONTROL, 0, 0},
>>> {R_028008_DB_DEPTH_VIEW, 0, 0},
>>> {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> @@ -321,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = {
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> {R_028058_DB_DEPTH_SIZE, 0, 0},
>>> {R_02805C_DB_DEPTH_SLICE, 0, 0},
>>> + {R_02802C_DB_DEPTH_CLEAR, 0, 0},
>>> + {R_028ABC_DB_HTILE_SURFACE, 0, 0},
>>> + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
>>> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
>>> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
>>> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
>>> diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c
>>> index 404df02..9cadaa1 100644
>>> --- a/src/gallium/drivers/r600/evergreen_state.c
>>> +++ b/src/gallium/drivers/r600/evergreen_state.c
>>> @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct pipe_context *ctx,
>>> }
>>> blend->cb_target_mask = target_mask;
>>>
>>> - if (target_mask)
>>> + if (target_mask) {
>>> color_control |= S_028808_MODE(V_028808_CB_NORMAL);
>>> - else
>>> + } else {
>>> color_control |= S_028808_MODE(V_028808_CB_DISABLE);
>>> + }
>>>
>>> r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL,
>>> color_control);
>>> +
>>> /* only have dual source on MRT0 */
>>> blend->dual_src_blend = util_blend_state_is_dual(state, 0);
>>> for (int i = 0; i < 8; i++) {
>>> @@ -759,7 +761,6 @@ static void *evergreen_create_dsa_state(struct pipe_context *ctx,
>>> struct r600_context *rctx = (struct r600_context *)ctx;
>>> struct r600_pipe_dsa *dsa = CALLOC_STRUCT(r600_pipe_dsa);
>>> unsigned db_depth_control, alpha_test_control, alpha_ref;
>>> - unsigned db_render_control;
>>> struct r600_pipe_state *rstate;
>>>
>>> if (dsa == NULL) {
>>> @@ -807,9 +808,7 @@ static void *evergreen_create_dsa_state(struct pipe_context *ctx,
>>> dsa->alpha_ref = alpha_ref;
>>>
>>> /* misc */
>>> - db_render_control = 0;
>>> r600_pipe_state_add_reg(rstate, R_028800_DB_DEPTH_CONTROL, db_depth_control);
>>> - r600_pipe_state_add_reg(rstate, R_028000_DB_RENDER_CONTROL, db_render_control);
>>> return rstate;
>>> }
>>>
>>> @@ -1671,6 +1670,28 @@ static void evergreen_db(struct r600_context *rctx, struct r600_pipe_state *rsta
>>> }
>>> }
>>>
>>> + /* hyperz */
>>> + if (rtex->hyperz) {
>>> + uint64_t htile_offset = rtex->hyperz->surface.level[level].offset;
>>> +
>>> + if (!rctx->db_misc_state.hyperz) {
>>> + rctx->db_misc_state.hyperz = true;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> + z_info |= S_028040_TILE_SURFACE_ENABLE(1);
>>> + r600_pipe_state_add_reg_bo(rstate, R_028014_DB_HTILE_DATA_BASE,
>>> + htile_offset >> 8, &rtex->hyperz->resource,
>>> + RADEON_USAGE_READWRITE);
>>> + /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */
>>> + r600_pipe_state_add_reg(rstate, R_028ABC_DB_HTILE_SURFACE, rtex->db_htile_surface);
>>> + r600_pipe_state_add_reg(rstate, R_028AC8_DB_PRELOAD_CONTROL, rtex->db_preload_control);
>>> + } else {
>>> + if (rctx->db_misc_state.hyperz) {
>>> + rctx->db_misc_state.hyperz = FALSE;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> + }
>>> +
>>> r600_pipe_state_add_reg_bo(rstate, R_028040_DB_Z_INFO, z_info,
>>> &rtex->resource, RADEON_USAGE_READWRITE);
>>> r600_pipe_state_add_reg(rstate, R_028058_DB_DEPTH_SIZE,
>>> @@ -1750,19 +1771,47 @@ static void evergreen_emit_db_misc_state(struct r600_context *rctx, struct r600_
>>> {
>>> struct radeon_winsys_cs *cs = rctx->cs;
>>> struct r600_db_misc_state *a = (struct r600_db_misc_state*)atom;
>>> + unsigned db_render_override = 0;
>>> + unsigned db_render_control = 0;
>>> unsigned db_count_control = 0;
>>> - unsigned db_render_override =
>>> - S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_DISABLE) |
>>> - S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
>>> - S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE);
>>> -
>>> + unsigned cliprect_rule = 0xffff;
>>> +
>>> + db_render_override = S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
>>> + S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE);
>>> + if (a->hyperz && !a->flush_depthstencil_through_cb) {
>>> + /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */
>>> + db_render_override |= S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_OFF);
>>> + if (a->clear_depthstencil) {
>>> + db_render_control |= S_028000_DEPTH_CLEAR_ENABLE(1);
>>> + /* need to disable cliprect for fast clear */
>>> + cliprect_rule = 0;
>>> + }
>>> + } else {
>>> + db_render_override |= S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_DISABLE);
>>> + }
>>> if (a->occlusion_query_enabled) {
>>> db_count_control |= S_028004_PERFECT_ZPASS_COUNTS(1);
>>> db_render_override |= S_02800C_NOOP_CULL_DISABLE(1);
>>> }
>>> + if (a->flush_depthstencil_through_cb) {
>>> + db_render_control |= S_028000_DEPTH_COPY_ENABLE(1) |
>>> + S_028000_STENCIL_COPY_ENABLE(1) |
>>> + S_028000_COPY_CENTROID(1);
>>> + }
>>>
>>> + if (rctx->framebuffer.zsbuf) {
>>> + struct r600_resource_texture *rtex;
>>> + unsigned level = rctx->framebuffer.zsbuf->u.tex.level;
>>> +
>>> + rtex = (struct r600_resource_texture*)rctx->framebuffer.zsbuf->texture;
>>> + r600_write_context_reg_seq(cs, R_02802C_DB_DEPTH_CLEAR, 1);
>>> + r600_write_value(cs, fui((float)rtex->depth_clear_value[level]));
>>> + }
>>
>> This code depends on the zbuffer, but you don't call r600_atom_dirty
>> when the zbuffer is changed. This should be fixed in
>> set_framebuffer_state. Same for r600_state.c.
>
> atom_dirty is call inside set_db only if there is hyperz and it's only
> usefull with hyperz so it's fine. I think i need to make atom dirty
> unconditional in set_db.
>
>>> + r600_write_context_reg(cs, R_028000_DB_RENDER_CONTROL, db_render_control);
>>> r600_write_context_reg(cs, R_028004_DB_COUNT_CONTROL, db_count_control);
>>> r600_write_context_reg(cs, R_02800C_DB_RENDER_OVERRIDE, db_render_override);
>>> + r600_write_context_reg_seq(cs, R_02820C_PA_SC_CLIPRECT_RULE, 1);
>>> + r600_write_value(cs, cliprect_rule);
>>
>> You could just use r600_write_context_reg here.
>
> k
>
>>> }
>>>
>>> static void evergreen_emit_vertex_buffers(struct r600_context *rctx, struct r600_atom *atom)
>>> @@ -2033,19 +2082,15 @@ static void cayman_init_atom_start_cs(struct r600_context *rctx)
>>> r600_store_value(cb, ~0); /* CM_R_028C38_PA_SC_AA_MASK_X0Y0_X1Y0 */
>>> r600_store_value(cb, ~0); /* CM_R_028C3C_PA_SC_AA_MASK_X0Y1_X1Y1 */
>>>
>>> - r600_store_context_reg_seq(cb, R_028028_DB_STENCIL_CLEAR, 2);
>>> - r600_store_value(cb, 0); /* R_028028_DB_STENCIL_CLEAR */
>>> - r600_store_value(cb, 0x3F800000); /* R_02802C_DB_DEPTH_CLEAR */
>>> + r600_store_context_reg_seq(cb, R_009830_DB_DEBUG, 3);
>>> + r600_store_value(cb, 0); /* R_009830_DB_DEBUG */
>>> + r600_store_value(cb, 0); /* R_009834_DB_DEBUG2 */
>>> + r600_store_value(cb, 0); /* R_009838_DB_DEBUG3 */
>>> + r600_store_config_reg(cb, R_009854_DB_WATERMARKS, 0x00420204);
>>>
>>> r600_store_context_reg(cb, R_0286DC_SPI_FOG_CNTL, 0);
>>>
>>> - r600_store_context_reg_seq(cb, R_028AC0_DB_SRESULTS_COMPARE_STATE0, 3);
>>> - r600_store_value(cb, 0); /* R_028AC0_DB_SRESULTS_COMPARE_STATE0 */
>>> - r600_store_value(cb, 0); /* R_028AC4_DB_SRESULTS_COMPARE_STATE1 */
>>> - r600_store_value(cb, 0); /* R_028AC8_DB_PRELOAD_CONTROL */
>>> -
>>> r600_store_context_reg(cb, R_028200_PA_SC_WINDOW_OFFSET, 0);
>>> - r600_store_context_reg(cb, R_02820C_PA_SC_CLIPRECT_RULE, 0xFFFF);
>>>
>>> r600_store_context_reg_seq(cb, R_0282D0_PA_SC_VPORT_ZMIN_0, 2);
>>> r600_store_value(cb, 0); /* R_0282D0_PA_SC_VPORT_ZMIN_0 */
>>> @@ -2520,7 +2565,6 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx)
>>> r600_store_value(cb, 0x3F800000); /* R_02802C_DB_DEPTH_CLEAR */
>>>
>>> r600_store_context_reg(cb, R_028200_PA_SC_WINDOW_OFFSET, 0);
>>> - r600_store_context_reg(cb, R_02820C_PA_SC_CLIPRECT_RULE, 0xFFFF);
>>> r600_store_context_reg(cb, R_028230_PA_SC_EDGERULE, 0xAAAAAAAA);
>>>
>>> r600_store_context_reg_seq(cb, R_0282D0_PA_SC_VPORT_ZMIN_0, 2);
>>> @@ -2531,10 +2575,9 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx)
>>> r600_store_context_reg(cb, R_028818_PA_CL_VTE_CNTL, 0x0000043F);
>>> r600_store_context_reg(cb, R_028820_PA_CL_NANINF_CNTL, 0);
>>>
>>> - r600_store_context_reg_seq(cb, R_028AC0_DB_SRESULTS_COMPARE_STATE0, 3);
>>> + r600_store_context_reg_seq(cb, R_028AC0_DB_SRESULTS_COMPARE_STATE0, 2);
>>> r600_store_value(cb, 0); /* R_028AC0_DB_SRESULTS_COMPARE_STATE0 */
>>> r600_store_value(cb, 0); /* R_028AC4_DB_SRESULTS_COMPARE_STATE1 */
>>> - r600_store_value(cb, 0); /* R_028AC8_DB_PRELOAD_CONTROL */
>>>
>>> r600_store_context_reg(cb, R_028B70_DB_ALPHA_TO_MASK, 0x0000AA00);
>>>
>>> @@ -2634,7 +2677,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader
>>>
>>> rstate->nregs = 0;
>>>
>>> - db_shader_control = S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
>>> + db_shader_control = 0;
>>> for (i = 0; i < rshader->ninput; i++) {
>>> /* evergreen NUM_INTERP only contains values interpolated into the LDS,
>>> POSITION goes via GPRs from the SC so isn't counted */
>>> @@ -2840,11 +2883,6 @@ void *evergreen_create_db_flush_dsa(struct r600_context *rctx)
>>> memset(&dsa, 0, sizeof(dsa));
>>>
>>> rstate = rctx->context.create_depth_stencil_alpha_state(&rctx->context, &dsa);
>>> - r600_pipe_state_add_reg(rstate,
>>> - R_028000_DB_RENDER_CONTROL,
>>> - S_028000_DEPTH_COPY_ENABLE(1) |
>>> - S_028000_STENCIL_COPY_ENABLE(1) |
>>> - S_028000_COPY_CENTROID(1));
>>> /* Don't set the 'is_flush' flag in r600_pipe_dsa, evergreen doesn't need it. */
>>> return rstate;
>>> }
>>> @@ -2853,14 +2891,13 @@ void evergreen_update_dual_export_state(struct r600_context * rctx)
>>> {
>>> unsigned dual_export = rctx->export_16bpc && rctx->nr_cbufs &&
>>> !rctx->ps_shader->current->ps_depth_export;
>>> -
>>> unsigned db_source_format = dual_export ? V_02880C_EXPORT_DB_TWO :
>>> V_02880C_EXPORT_DB_FULL;
>>> -
>>> unsigned db_shader_control = rctx->ps_shader->current->db_shader_control |
>>> S_02880C_DUAL_EXPORT_ENABLE(dual_export) |
>>> S_02880C_DB_SOURCE_FORMAT(db_source_format);
>>>
>>> + db_shader_control |= S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
>>> if (db_shader_control != rctx->db_shader_control) {
>>> struct r600_pipe_state rstate;
>>>
>>> diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h
>>> index 6c4873c..1ac5944 100644
>>> --- a/src/gallium/drivers/r600/evergreend.h
>>> +++ b/src/gallium/drivers/r600/evergreend.h
>>> @@ -1589,6 +1589,10 @@
>>> #define S_028008_SLICE_MAX(x) (((x) & 0x7FF) << 13)
>>> #define G_028008_SLICE_MAX(x) (((x) >> 13) & 0x7FF)
>>> #define C_028008_SLICE_MAX 0xFF001FFF
>>> +#define R_009830_DB_DEBUG 0x00009830
>>> +#define R_009834_DB_DEBUG2 0x00009834
>>> +#define R_009838_DB_DEBUG3 0x00009838
>>> +#define R_009854_DB_WATERMARKS 0x00009854
>>> #define R_02800C_DB_RENDER_OVERRIDE 0x0002800C
>>> #define V_02800C_FORCE_OFF 0
>>> #define V_02800C_FORCE_ENABLE 1
>>> diff --git a/src/gallium/drivers/r600/r600_blit.c b/src/gallium/drivers/r600/r600_blit.c
>>> index fff48a4..4286cca 100644
>>> --- a/src/gallium/drivers/r600/r600_blit.c
>>> +++ b/src/gallium/drivers/r600/r600_blit.c
>>> @@ -24,6 +24,7 @@
>>> #include "util/u_surface.h"
>>> #include "util/u_blitter.h"
>>> #include "util/u_format.h"
>>> +#include "r600d.h"
>>>
>>> enum r600_blitter_op /* bitmask */
>>> {
>>> @@ -132,8 +133,7 @@ void r600_blit_uncompress_depth(struct pipe_context *ctx,
>>> rctx->family == CHIP_RV620 || rctx->family == CHIP_RV635)
>>> depth = 0.0f;
>>>
>>> - if (rctx->chip_class <= R700 &&
>>> - !rctx->db_misc_state.flush_depthstencil_through_cb) {
>>> + if (!rctx->db_misc_state.flush_depthstencil_through_cb) {
>>> /* Enable decompression in DB_RENDER_CONTROL */
>>> rctx->db_misc_state.flush_depthstencil_through_cb = true;
>>> r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> @@ -179,11 +179,9 @@ void r600_blit_uncompress_depth(struct pipe_context *ctx,
>>> }
>>> }
>>>
>>> - if (rctx->chip_class <= R700) {
>>> - /* Disable decompression in DB_RENDER_CONTROL */
>>> - rctx->db_misc_state.flush_depthstencil_through_cb = false;
>>> - r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> - }
>>> + /* reenable compression in DB_RENDER_CONTROL */
>>> + rctx->db_misc_state.flush_depthstencil_through_cb = false;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> }
>>>
>>> static void r600_flush_depth_textures(struct r600_context *rctx,
>>> @@ -223,6 +221,31 @@ static void r600_clear(struct pipe_context *ctx, unsigned buffers,
>>> struct r600_context *rctx = (struct r600_context *)ctx;
>>> struct pipe_framebuffer_state *fb = &rctx->framebuffer;
>>>
>>> + /* if hyperz enabled just clear hyperz */
>>> + if (fb->zsbuf && (buffers & PIPE_CLEAR_DEPTHSTENCIL)) {
>>> + struct r600_resource_texture *rtex;
>>> + unsigned level = fb->zsbuf->u.tex.level;
>>> +
>>> + rtex = (struct r600_resource_texture*)fb->zsbuf->texture;
>>> + if (rtex->hyperz) {
>>> + /* set clear value, as we use R600_CLEAR_SURFACE
>>> + * the framebuffer state will be reset with proper
>>> + * depth clear value
>>> + */
>>
>> This comment is wrong. We don't use R600_CLEAR_SURFACE in this function.
>
> left over
>
>>> + rtex->depth_clear_value[level] = depth;
>>> + if (buffers & PIPE_CLEAR_DEPTH) {
>>> + rctx->db_misc_state.hyperz = true;
>>> + if (rtex->htile_initialized[level]) {
>>> + rctx->db_misc_state.clear_depthstencil = true;
>>> + } else {
>>> + rtex->htile_initialized[level] = true;
>>> + rctx->db_misc_state.db_htile_surface_mask = 0xf;
>>
>> I don't like this 0xf magic value. Please use proper definitions from r600d.h.
>>
>
> Using define will make it a very big line.
>
>>> + }
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> + }
>>> + }
>>> +
>>> r600_blitter_begin(ctx, R600_CLEAR);
>>> util_blitter_clear(rctx->blitter, fb->width, fb->height,
>>> fb->nr_cbufs, buffers, fb->nr_cbufs ? fb->cbufs[0]->format : PIPE_FORMAT_NONE,
>>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c
>>> index b236069..d09e7d8 100644
>>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>>> @@ -180,6 +180,27 @@ static void r600_init_block(struct r600_context *ctx,
>>> (ctx->family < CHIP_RV770) && reg[i+j].flags & REG_FLAG_RV6XX_SBU) {
>>> block->pm4[block->pm4_ndwords++] = PKT3(PKT3_SURFACE_BASE_UPDATE, 0, 0);
>>> block->pm4[block->pm4_ndwords++] = reg[i+j].sbu_flags;
>>> + if (reg[i+j].sbu_flags & SURFACE_BASE_UPDATE_DEPTH) {
>>> + /* to work around flushing issue in htile surface */
>>> + block->pm4[block->pm4_ndwords++] = PKT3(PKT3_NOP, 16, 0);
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + block->pm4[block->pm4_ndwords++] = 0xcafedead;
>>> + }
>>> }
>>> }
>>> /* check that we stay in limit */
>>> @@ -364,7 +385,11 @@ static const struct r600_reg r600_context_reg_list[] = {
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> {R_028010_DB_DEPTH_INFO, REG_FLAG_NEED_BO, 0},
>>> {R_028A6C_VGT_GS_OUT_PRIM_TYPE, 0, 0},
>>> + {R_02802C_DB_DEPTH_CLEAR, 0, 0},
>>> + {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> + {R_028014_DB_HTILE_DATA_BASE, REG_FLAG_NEED_BO, 0},
>>> {R_028D24_DB_HTILE_SURFACE, 0, 0},
>>> + {R_028D30_DB_PRELOAD_CONTROL, 0, 0},
>>> {R_028D34_DB_PREFETCH_LIMIT, 0, 0},
>>> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
>>> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
>>> diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
>>> index a0a8a58..1639eab 100644
>>> --- a/src/gallium/drivers/r600/r600_pipe.c
>>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>>> @@ -952,6 +952,7 @@ struct pipe_screen *r600_screen_create(struct radeon_winsys *ws)
>>>
>>> rscreen->use_surface_alloc = debug_get_bool_option("R600_SURF", TRUE);
>>> rscreen->glsl_feature_level = debug_get_bool_option("R600_GLSL130", TRUE) ? 130 : 120;
>>> + rscreen->use_hyperz = debug_get_bool_option("R600_HYPERZ", FALSE);
>>>
>>> rscreen->global_pool = compute_memory_pool_new(0, rscreen);
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
>>> index 877088b..1b66e03 100644
>>> --- a/src/gallium/drivers/r600/r600_pipe.h
>>> +++ b/src/gallium/drivers/r600/r600_pipe.h
>>> @@ -77,9 +77,13 @@ struct r600_surface_sync_cmd {
>>> };
>>>
>>> struct r600_db_misc_state {
>>> - struct r600_atom atom;
>>> - bool occlusion_query_enabled;
>>> - bool flush_depthstencil_through_cb;
>>> + struct r600_atom atom;
>>> + unsigned db_htile_surface_mask;
>>> + bool occlusion_query_enabled;
>>> + bool flush_depthstencil_through_cb;
>>> + bool clear_depthstencil;
>>> + bool hyperz;
>>> + bool resummarize;
>>> };
>>>
>>> struct r600_cb_misc_state {
>>> @@ -144,6 +148,7 @@ struct r600_screen {
>>> struct r600_pipe_fences fences;
>>>
>>> bool use_surface_alloc;
>>> + bool use_hyperz;
>>> int glsl_feature_level;
>>>
>>> /*for compute global memory binding, we allocate stuff here, instead of
>>> @@ -183,7 +188,7 @@ struct r600_pipe_dsa {
>>> unsigned alpha_ref;
>>> ubyte valuemask[2];
>>> ubyte writemask[2];
>>> - unsigned sx_alpha_test_control;
>>> + unsigned sx_alpha_test_control;
>>> };
>>>
>>> struct r600_vertex_element
>>> diff --git a/src/gallium/drivers/r600/r600_resource.h b/src/gallium/drivers/r600/r600_resource.h
>>> index a7570c7..59ec025 100644
>>> --- a/src/gallium/drivers/r600/r600_resource.h
>>> +++ b/src/gallium/drivers/r600/r600_resource.h
>>> @@ -64,6 +64,13 @@ struct r600_resource_texture {
>>> struct r600_resource_texture *flushed_depth_texture;
>>> boolean is_flushing_texture;
>>> struct radeon_surface surface;
>>> + unsigned db_prefetch_limit;
>>> + unsigned db_htile_surface;
>>> + unsigned db_preload_control;
>>> + struct r600_resource_texture *hyperz;
>>> + float depth_clear_value[PIPE_MAX_TEXTURE_LEVELS];
>>> + /* first depth clear initialize the htile buffer */
>>> + bool htile_initialized[PIPE_MAX_TEXTURE_LEVELS];
>>> };
>>>
>>> #define R600_TEX_IS_TILED(tex, level) ((tex)->array_mode[level] != V_038000_ARRAY_LINEAR_GENERAL && (tex)->array_mode[level] != V_038000_ARRAY_LINEAR_ALIGNED)
>>> diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c
>>> index 6c0c0fe..1341254 100644
>>> --- a/src/gallium/drivers/r600/r600_state.c
>>> +++ b/src/gallium/drivers/r600/r600_state.c
>>> @@ -1581,6 +1581,7 @@ static void r600_db(struct r600_context *rctx, struct r600_pipe_state *rstate,
>>> struct r600_resource_texture *rtex;
>>> struct r600_surface *surf;
>>> unsigned level, pitch, slice, format, offset, array_mode;
>>> + unsigned db_depth_info;
>>>
>>> if (state->zsbuf == NULL)
>>> return;
>>> @@ -1625,6 +1626,29 @@ static void r600_db(struct r600_context *rctx, struct r600_pipe_state *rstate,
>>>
>>> format = r600_translate_dbformat(state->zsbuf->format);
>>> assert(format != ~0);
>>> + db_depth_info = S_028010_ARRAY_MODE(array_mode) | S_028010_FORMAT(format);
>>> +
>>> + /* hyperz */
>>> + if (rtex->hyperz) {
>>> + uint64_t htile_offset = rtex->hyperz->surface.level[level].offset;
>>> +
>>> + if (!rctx->db_misc_state.hyperz) {
>>> + rctx->db_misc_state.hyperz = true;
>>> + rctx->db_misc_state.db_htile_surface_mask = 0xffffffff;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> + db_depth_info |= S_028010_TILE_SURFACE_ENABLE(1);
>>> + r600_pipe_state_add_reg_bo(rstate, R_028014_DB_HTILE_DATA_BASE,
>>> + htile_offset >> 8, &rtex->hyperz->resource,
>>> + RADEON_USAGE_READWRITE);
>>> + r600_pipe_state_add_reg(rstate, R_028D30_DB_PRELOAD_CONTROL, rtex->db_preload_control);
>>> + r600_pipe_state_add_reg(rstate, R_028D34_DB_PREFETCH_LIMIT, rtex->db_prefetch_limit);
>>> + } else {
>>> + if (rctx->db_misc_state.hyperz) {
>>> + rctx->db_misc_state.hyperz = FALSE;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> + }
>>>
>>> r600_pipe_state_add_reg_bo(rstate, R_02800C_DB_DEPTH_BASE,
>>> offset >> 8, &rtex->resource, RADEON_USAGE_READWRITE);
>>> @@ -1638,8 +1662,8 @@ static void r600_db(struct r600_context *rctx, struct r600_pipe_state *rstate,
>>> S_028004_SLICE_MAX(state->zsbuf->u.tex.last_layer));
>>> }
>>> r600_pipe_state_add_reg_bo(rstate, R_028010_DB_DEPTH_INFO,
>>> - S_028010_ARRAY_MODE(array_mode) | S_028010_FORMAT(format),
>>> - &rtex->resource, RADEON_USAGE_READWRITE);
>>> + db_depth_info,
>>> + &rtex->resource, RADEON_USAGE_READWRITE);
>>> r600_pipe_state_add_reg(rstate, R_028D34_DB_PREFETCH_LIMIT,
>>> (surf->aligned_height / 8) - 1);
>>> }
>>> @@ -1723,10 +1747,34 @@ static void r600_emit_db_misc_state(struct r600_context *rctx, struct r600_atom
>>> struct radeon_winsys_cs *cs = rctx->cs;
>>> struct r600_db_misc_state *a = (struct r600_db_misc_state*)atom;
>>> unsigned db_render_control = 0;
>>> - unsigned db_render_override =
>>> - S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_DISABLE) |
>>> - S_028D10_FORCE_HIS_ENABLE0(V_028D10_FORCE_DISABLE) |
>>> - S_028D10_FORCE_HIS_ENABLE1(V_028D10_FORCE_DISABLE);
>>> + unsigned db_render_override = 0;
>>> + unsigned cliprect_rule = 0xffff;
>>> + unsigned db_htile_surface = 0;
>>> + struct r600_resource_texture *rtex;
>>> +
>>> + if (a->hyperz && rctx->framebuffer.zsbuf) {
>>> + rtex = (struct r600_resource_texture*)rctx->framebuffer.zsbuf->texture;
>>> +
>>> + db_htile_surface = rtex->db_htile_surface;
>>> + db_htile_surface &= rctx->db_misc_state.db_htile_surface_mask;
>>> + /* further htile surface without preload */
>>> + rctx->db_misc_state.db_htile_surface_mask = 0xf;
>>
>> Again, please use proper definitions from r600d.h in place of 0xf.
>>
>>> + }
>>> +
>>> + db_render_override |= S_028D10_FORCE_HIS_ENABLE0(V_028D10_FORCE_DISABLE) |
>>> + S_028D10_FORCE_HIS_ENABLE1(V_028D10_FORCE_DISABLE);
>>> + if (a->hyperz) {
>>> + /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */
>>> + db_render_override |= S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_OFF);
>>> + if (a->clear_depthstencil) {
>>> + db_render_control |= S_028D0C_DEPTH_CLEAR_ENABLE(1);
>>> + db_render_control |= S_028D0C_ZPASS_INCREMENT_DISABLE(1);
>>> + /* need to disable cliprect for fast clear */
>>> + cliprect_rule = 0;
>>> + }
>>> + } else {
>>> + db_render_override |= S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_DISABLE);
>>> + }
>>>
>>> if (a->occlusion_query_enabled) {
>>> if (rctx->chip_class >= R700) {
>>> @@ -1740,9 +1788,22 @@ static void r600_emit_db_misc_state(struct r600_context *rctx, struct r600_atom
>>> S_028D0C_COPY_CENTROID(1);
>>> }
>>>
>>> + if (rctx->framebuffer.zsbuf) {
>>> + struct r600_resource_texture *rtex;
>>> + unsigned level = rctx->framebuffer.zsbuf->u.tex.level;
>>> +
>>> + rtex = (struct r600_resource_texture*)rctx->framebuffer.zsbuf->texture;
>>> + r600_write_context_reg_seq(cs, R_02802C_DB_DEPTH_CLEAR, 1);
>>> + r600_write_value(cs, fui((float)rtex->depth_clear_value[level]));
>>> + }
>>> r600_write_context_reg_seq(cs, R_028D0C_DB_RENDER_CONTROL, 2);
>>> r600_write_value(cs, db_render_control); /* R_028D0C_DB_RENDER_CONTROL */
>>> r600_write_value(cs, db_render_override); /* R_028D10_DB_RENDER_OVERRIDE */
>>> + r600_write_context_reg_seq(cs, R_02820C_PA_SC_CLIPRECT_RULE, 1);
>>> + r600_write_value(cs, cliprect_rule);
>>> +
>>> + r600_write_context_reg_seq(cs, R_028D24_DB_HTILE_SURFACE, 1);
>>> + r600_write_value(cs, db_htile_surface);
>>
>> Again, you could use r600_write_context_reg here.
>>
>>> }
>>>
>>> static void r600_emit_vertex_buffers(struct r600_context *rctx, struct r600_atom *atom)
>>> @@ -1954,7 +2015,7 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
>>> int num_es_stack_entries;
>>> enum radeon_family family;
>>> struct r600_command_buffer *cb = &rctx->start_cs_cmd;
>>> - uint32_t tmp;
>>> + uint32_t tmp, db_watermarks, db_debug;
>>> unsigned i;
>>>
>>> r600_init_command_buffer(cb, 256, EMIT_EARLY);
>>> @@ -1974,6 +2035,7 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
>>> vs_prio = 1;
>>> gs_prio = 2;
>>> es_prio = 3;
>>> +
>>> switch (family) {
>>> case CHIP_R600:
>>> num_ps_gprs = 192;
>>> @@ -2140,15 +2202,55 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
>>>
>>> if (rctx->chip_class >= R700) {
>>> r600_store_config_reg(cb, R_008D8C_SQ_DYN_GPR_CNTL_PS_FLUSH_REQ, 0x00004000);
>>> - r600_store_config_reg(cb, R_009830_DB_DEBUG, 0);
>>> - r600_store_config_reg(cb, R_009838_DB_WATERMARKS, 0x00420204);
>>> r600_store_context_reg(cb, R_0286C8_SPI_THREAD_GROUPING, 0);
>>> } else {
>>> r600_store_config_reg(cb, R_008D8C_SQ_DYN_GPR_CNTL_PS_FLUSH_REQ, 0);
>>> - r600_store_config_reg(cb, R_009830_DB_DEBUG, 0x82000000);
>>> - r600_store_config_reg(cb, R_009838_DB_WATERMARKS, 0x01020204);
>>> r600_store_context_reg(cb, R_0286C8_SPI_THREAD_GROUPING, 1);
>>> }
>>> +
>>> + /* FIXME db_watermarks & db_debug need adjustment for MSAA */
>>> + switch (family) {
>>> + case CHIP_R600:
>>> + db_debug = 0x82200000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RV630:
>>> + db_debug = 0x92000000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RV635:
>>> + db_debug = 0x82000000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RV610:
>>> + db_debug = 0x82000000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RV620:
>>> + default:
>>> + db_debug = 0x82000000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RS780:
>>> + case CHIP_RS880:
>>> + db_debug = 0x88000000;
>>> + db_watermarks = 0x81020204;
>>> + break;
>>> + case CHIP_RV670:
>>> + db_debug = 0x80000000;
>>> + db_watermarks = 0x01020204;
>>> + break;
>>> + case CHIP_RV770:
>>> + case CHIP_RV730:
>>> + case CHIP_RV740:
>>> + case CHIP_RV710:
>>> + db_debug = 0x00000000;
>>> + db_watermarks = 0x00420204;
>>> + break;
>>> + }
>>> + r600_store_config_reg(cb, R_009830_DB_DEBUG, db_debug);
>>> + r600_store_config_reg(cb, R_009838_DB_WATERMARKS, db_watermarks);
>>> +
>>> r600_store_context_reg_seq(cb, R_0288A8_SQ_ESGS_RING_ITEMSIZE, 9);
>>> r600_store_value(cb, 0); /* R_0288A8_SQ_ESGS_RING_ITEMSIZE */
>>> r600_store_value(cb, 0); /* R_0288AC_SQ_GSVS_RING_ITEMSIZE */
>>> @@ -2192,9 +2294,8 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
>>>
>>> r600_store_ctl_const(cb, R_03CFF0_SQ_VTX_BASE_VTX_LOC, 0);
>>>
>>> - r600_store_context_reg_seq(cb, R_028028_DB_STENCIL_CLEAR, 2);
>>> + r600_store_context_reg_seq(cb, R_028028_DB_STENCIL_CLEAR, 1);
>>> r600_store_value(cb, 0); /* R_028028_DB_STENCIL_CLEAR */
>>> - r600_store_value(cb, 0x3F800000); /* R_02802C_DB_DEPTH_CLEAR */
>>>
>>> r600_store_context_reg_seq(cb, R_0286DC_SPI_FOG_CNTL, 3);
>>> r600_store_value(cb, 0); /* R_0286DC_SPI_FOG_CNTL */
>>> @@ -2234,7 +2335,6 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
>>> }
>>>
>>> r600_store_context_reg(cb, R_028200_PA_SC_WINDOW_OFFSET, 0);
>>> - r600_store_context_reg(cb, R_02820C_PA_SC_CLIPRECT_RULE, 0xFFFF);
>>>
>>> if (rctx->chip_class >= R700) {
>>> r600_store_context_reg(cb, R_028230_PA_SC_EDGERULE, 0xAAAAAAAA);
>>> @@ -2317,7 +2417,7 @@ void r600_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader *shad
>>> tmp);
>>> }
>>>
>>> - db_shader_control = S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
>>> + db_shader_control = 0;
>>> for (i = 0; i < rshader->noutput; i++) {
>>> if (rshader->output[i].name == TGSI_SEMANTIC_POSITION)
>>> z_export = 1;
>>> @@ -2478,13 +2578,14 @@ void *r600_create_db_flush_dsa(struct r600_context *rctx)
>>> return rctx->context.create_depth_stencil_alpha_state(&rctx->context, &dsa);
>>> }
>>>
>>> -void r600_update_dual_export_state(struct r600_context * rctx)
>>> +void r600_update_dual_export_state(struct r600_context *rctx)
>>> {
>>> unsigned dual_export = rctx->export_16bpc && rctx->nr_cbufs &&
>>> !rctx->ps_shader->current->ps_depth_export;
>>> unsigned db_shader_control = rctx->ps_shader->current->db_shader_control |
>>> S_02880C_DUAL_EXPORT_ENABLE(dual_export);
>>>
>>> + db_shader_control |= S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
>>> if (db_shader_control != rctx->db_shader_control) {
>>> struct r600_pipe_state rstate;
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
>>> index d952220..59efdb9 100644
>>> --- a/src/gallium/drivers/r600/r600_state_common.c
>>> +++ b/src/gallium/drivers/r600/r600_state_common.c
>>> @@ -1004,6 +1004,12 @@ void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *dinfo)
>>> rtex->dirty_db_mask |= 1 << surf->u.tex.level;
>>> }
>>>
>>> + /* clear hyperz */
>>> + if (rctx->db_misc_state.clear_depthstencil) {
>>> + rctx->db_misc_state.clear_depthstencil = false;
>>> + r600_atom_dirty(rctx, &rctx->db_misc_state.atom);
>>> + }
>>> +
>>
>> Why did you add this code into draw_vbo? This looks like it should be
>> at the end of r600_clear.
>>
>>> pipe_resource_reference(&ib.buffer, NULL);
>>> }
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c
>>> index d16c252..0bbddb9 100644
>>> --- a/src/gallium/drivers/r600/r600_texture.c
>>> +++ b/src/gallium/drivers/r600/r600_texture.c
>>> @@ -471,6 +471,10 @@ static void r600_texture_destroy(struct pipe_screen *screen,
>>> if (rtex->stencil)
>>> pipe_resource_reference((struct pipe_resource **)&rtex->stencil, NULL);
>>>
>>> + if (rtex->hyperz) {
>>> + pipe_resource_reference((struct pipe_resource **)&rtex->hyperz, NULL);
>>> + }
>>> +
>>> pb_reference(&resource->buf, NULL);
>>> FREE(rtex);
>>> }
>>> @@ -487,6 +491,53 @@ static const struct u_resource_vtbl r600_texture_vtbl =
>>> NULL /* transfer_inline_write */
>>> };
>>>
>>> +static void r600_htile_settings(struct r600_screen *rscreen,
>>> + struct r600_resource_texture *zbuf,
>>> + struct radeon_surface *hsurface)
>>> +{
>>> + unsigned max_pixels_per_db;
>>> + const unsigned k = 1024;
>>> + unsigned npix_x, npix_y;
>>> +
>>> + npix_x = hsurface->npix_x >> 5;
>>> + npix_y = hsurface->npix_y >> 5;
>>> + npix_x = npix_x ? npix_x - 1 : 0;
>>> + npix_y = npix_y ? npix_y - 1 : 0;
>>> + max_pixels_per_db = (hsurface->npix_x * hsurface->npix_y * rscreen->info.r600_num_backends);
>>> + max_pixels_per_db /= (rscreen->info.r600_num_tile_pipes * 2);
>>> + zbuf->db_prefetch_limit = (hsurface->npix_y / 8);
>>> + zbuf->db_prefetch_limit = zbuf->db_prefetch_limit ? zbuf->db_prefetch_limit - 1 : 0;
>>> + zbuf->db_preload_control = S_028D30_START_X(0) | S_028D30_START_Y(0) |
>>> + S_028D30_MAX_X(npix_x) |
>>> + S_028D30_MAX_Y(npix_y);
>>> + /* force htile to always 8x8 as there is bug with 4x4, 4x8 or 8x4 configuration */
>>> + zbuf->db_htile_surface = S_028D24_HTILE_WIDTH(1) | S_028D24_HTILE_HEIGHT(1);
>>> +// zbuf->db_htile_surface |= S_028D24_PRELOAD(1);
>>> +
>>> + if (max_pixels_per_db <= 64 * k) {
>>> + zbuf->db_htile_surface |= S_028D24_LINEAR(1);
>>> + } else if (max_pixels_per_db <= 512 * k) {
>>> + zbuf->db_htile_surface |= S_028D24_LINEAR(1);
>>> + zbuf->db_htile_surface |= S_028D24_FULL_CACHE(1);
>>> + } else {
>>> + zbuf->db_htile_surface |= S_028D24_FULL_CACHE(1);
>>> + if (hsurface->npix_x <= 512) {
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_WIDTH(16);
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_HEIGHT(4);
>>> + } else if (hsurface->npix_x <= 1024) {
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_WIDTH(16);
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_HEIGHT(2);
>>> + } else {
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_WIDTH(16);
>>> + zbuf->db_htile_surface |= S_028D24_PREFETCH_HEIGHT(0);
>>> + }
>>> + /* r6xx, r7xx have issue with preload window, don't use it */
>>> + if (rscreen->family >= CHIP_CEDAR) {
>>> + zbuf->db_htile_surface |= S_028D24_HTILE_USES_PRELOAD_WIN(1);
>>> + }
>>> + }
>>> +}
>>> +
>>> static struct r600_resource_texture *
>>> r600_texture_create_object(struct pipe_screen *screen,
>>> const struct pipe_resource *base,
>>> @@ -513,6 +564,7 @@ r600_texture_create_object(struct pipe_screen *screen,
>>> resource->b.b.screen = screen;
>>> rtex->pitch_override = pitch_in_bytes_override;
>>> rtex->real_format = base->format;
>>> + rtex->hyperz = NULL;
>>>
>>> /* We must split depth and stencil into two separate buffers on Evergreen. */
>>> if ((base->bind & PIPE_BIND_DEPTH_STENCIL) &&
>>> @@ -574,6 +626,52 @@ r600_texture_create_object(struct pipe_screen *screen,
>>> }
>>> }
>>>
>>> + if (!(base->flags & R600_RESOURCE_FLAG_TRANSFER) &&
>>> + util_format_is_depth_or_stencil(base->format) &&
>>> + rscreen->use_surface_alloc &&
>>> + rscreen->use_hyperz &&
>>> + rscreen->info.drm_minor >= 14 &&
>>> + base->target == PIPE_TEXTURE_2D) {
>>> + struct pipe_resource hyperz;
>>> + struct radeon_surface hsurface;
>>> +
>>> + /* Allocate the hyperz buffer. */
>>> + hyperz = *base;
>>> + hyperz.format = PIPE_FORMAT_A8R8G8B8_UNORM;
>>> + hsurface = *surface;
>>> + hsurface.npix_x = rtex->surface.level[0].nblk_x * rtex->surface.blk_w;
>>> + hsurface.npix_y = rtex->surface.level[0].nblk_y * rtex->surface.blk_h;
>>> + hsurface.blk_w = 1;
>>> + hsurface.blk_h = 1;
>>> + hsurface.bpe = 4;
>>> + hsurface.flags = RADEON_SURF_CLR(hsurface.flags, MODE);
>>> + r600_htile_settings(rscreen, rtex, &hsurface);
>>> + hsurface.npix_x = align(hsurface.npix_x, 64);
>>> + hsurface.npix_y = align(hsurface.npix_y, 32);
>>> +// hsurface.npix_x = hsurface.npix_x / 8;
>>> +// hsurface.npix_y = hsurface.npix_y / 8;
>>> + hyperz.width0 = hsurface.npix_x;
>>> + hyperz.height0 = hsurface.npix_y;
>>> + hyperz.last_level = base->last_level;
>>> + hyperz.nr_samples = 1;
>>> + hyperz.bind = PIPE_BIND_RENDER_TARGET;
>>> + hyperz.flags = 0;
>>> +
>>> + rtex->hyperz = r600_texture_create_object(screen, &hyperz, array_mode, 0,
>>> + max_buffer_size, NULL, TRUE, &hsurface);
>>> + if (!rtex->hyperz) {
>>> + FREE(rtex);
>>> + return NULL;
>>> + }
>>> +
>>> + /* this is ugly but it's needed so that hyperz works without
>>> + * glitch. Otherwise various tile will have wrong hyperz value.
>>> + */
>>> + for (r = 0; r <= base->last_level; r++) {
>>> + rtex->htile_initialized[r] = false;
>>> + }
>>
>> rtex is calloc'd, which means the array is already initalized to false. :)
>>
>> Marek
>
> left over from debugging.
>
> Will respin with proper change
>
> Cheers,
> Jerome
More information about the mesa-dev
mailing list