[Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

Jason Ekstrand jason at jlekstrand.net
Tue Dec 5 18:24:46 UTC 2017

On Tue, Dec 5, 2017 at 10:17 AM, Pohjolainen, Topi <
topi.pohjolainen at gmail.com> wrote:

> On Tue, Dec 05, 2017 at 10:26:33AM +0000, Rogovin, Kevin wrote:
> > Hi,
> >
> >
> > >> Here are my comments of the patch posted:
> > >>
> > >>  1.  it is essentially replication and moving around of the code of
> the patch series posted earlier but missing various
> > >>       important bits: preventing the sampler from using the auxiliary
> buffer (this requires to modify surface state
> > >>       sent in brw_wm_surfaces.c). It also does not cover blorp
> sufficiently (blorp might read from an ASTC5x5
> > >>       and there are more paths in blorp than blorp_surf_for_miptree()
> that sample from surfaces.
> > >>
> >
> > >Can you explain both more in detail? Resolves done in
> > >brw_predraw_resolve_inputs() mean that there is nothing interesting in
> the aux buffers and surface setup won't therefore enable auxiliary for
> texture surfaces.
> >
> > That there is nothing interesting is irrelevant to the sampler if the
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the
> GPU command; The sampler will always try to read the auxiliary buffer if it
> is given the opportunity to do so. Indeed, it is quite feasible that less
> bandwidth is consumed if the sampler is given the chance to read an
> auxiliary buffer in place of the buffer; as such even if the surface is
> resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for
> HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer
> is fully resolved (see inte_mipmap_tree_sample_with_hiz() in
> intel_mipmap_tree.c).

We do have code which checks for the resolve state and disables the
auxiliary buffer.  However, I don't like relying on it for correctness as
that's bitten us before.  I think it'd be better to check the W/A state for
ASTC5x5, assert that everything is resolved, and manually disable aux in
that case.

> > > In case of blorp, as far as I know all operations sampling something
> should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> >
> > Blorp is used in more than blorp_surf_for_miptree(), for example
> implementing GetTexImage(). Indeed, it is possible for blorp to sample from
> an ASTC5x5 (you can see this handled in the patch series I posted). I chose
> the hammer that the default is to just assume blorp is going to access
> auxiliary buffers unless a flag is set when the caller knows that blorp is
> going to sample from an astc5x5 (against see my patch series).

This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations
are handled by that function.  It's a perfectly reasonable place to handle
these things.  It could also be handled in genX(blorp_exec) if that makes
someone more comfortable.

> >Right. In the case of sampling both aux and astc5x5 in the same draw
> cycle the only thing to do is to disable aux. With my question of direction
> I meant the texture
> > > cache flush between two cycles. Do we need to flush in both cases
> > > 1) ASTC5x5 in first cycle and AUX in the following
> > > 2) AUX in first cycle and ASTC5x5 in the following
> >
> > YES we need to flush in both cases. What is happening is that the
> sampler hardware is bugged. Let us suppose it was bugged in only 1
> direction, take 1. Then if the sampler first samples from an ASTC5x5 then
> an AUX it would not hang, but the other way it would. However, if there are
> multiple draws in flight where one samples from an ASTC5x5 and the other
> does not, the command buffer order gives ZERO guarantee that the sampler
> will sample in that order because fragments get executed out-of-order even
> across draw calls even within a subslice (this is why sendc is needed at
> end of shader in GEN).
> >
> I need to clarify this bit a little more. In the current setup we have only
> one draw cycle in flight per context that uses sample engine. There may be
> blitter calls in parallel (although I'm not sure) but they don't use
> sampling
> engine anyway.
> The draw cycle itself may have multiple draws programmed into it but as
> they
> all are based on the same surface setup there is naturally no flushing
> problem.
> Surfaces with auxiliary would be resolved before the draw and programmed
> without auxiliary buffer.

The problem is that a single invalidate doesn't actually cause a
synchronization point in the rendering operation.  All it does is torch the
texture cache and it does so immediately and in parallel with currently
active rendering.  If you can't have ASTC5x5 in the texture cache with a
CCS_E surface, then we need to do a CS stall to ensure that the previous
draw is complete before we invalidate.  Otherwise, if the draw with ASTC5x5
is still in-flight, the texture cache will immediately start to get
populated with ASTC5x5 data again.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171205/17208f6a/attachment-0001.html>

More information about the mesa-dev mailing list