[Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

Pohjolainen, Topi topi.pohjolainen at gmail.com
Tue Dec 5 11:00:28 UTC 2017

On Tue, Dec 05, 2017 at 10:26:33AM +0000, Rogovin, Kevin wrote:
> Hi,
> >> Here are my comments of the patch posted:
> >> 
> >>  1.  it is essentially replication and moving around of the code of the patch series posted earlier but missing various
> >>       important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
> >>       sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
> >>       and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
> >> 
> >Can you explain both more in detail? Resolves done in
> >brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces.
> That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).

If you take a look at brw_update_texture_surface(), just in the end before
brw_emit_surface_state() the logic explictly consults for
intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer
is resolved and it doesn't need to be programmed.

> > In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series).

This path goes thru brw_blorp_download_miptree() which in turn takes either
brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult

> >Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture 
> > cache flush between two cycles. Do we need to flush in both cases
> > 1) ASTC5x5 in first cycle and AUX in the following
> > 2) AUX in first cycle and ASTC5x5 in the following
> YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN).

This would be a nice a piece of documentation to add into the code. Thanks
for explaining.

> >>  4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
> >>       that only one of the two is possible without texture invalidates.
> > Can you elaborate this a little more? It tells if aux is/was used and it tells if astc5x5 is/was used. That is all we need, right?
> WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both.

I don't see how the bit mask prevents one from forcing anything. I now do see
a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5
being present.

> At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested.

Well, if nothing else, I would really like to see you used
brw_predraw_resolve_inputs() for the resolves.

> -Kevin

More information about the mesa-dev mailing list