[Mesa-dev] swizzling in llvmpipe [was: other stuff]

Wed Sep 1 11:00:00 PDT 2010

On Wed, 2010-09-01 at 09:24 -0700, Luca Barbieri wrote:
> > It's an impressive amount of work you did here. I'll comment only on the
> > llvmpipe of the changes for now.
> 
> Thanks for your feedback!
> 
> > Admittedly, always using a floating point is not ideal. A better
> > solution would be to choose a swizzled data type (unorm8, fixed point,
> > float, etc) that matched the color buffer format.
> 
> Exactly, also because we'll want unnormalized formats and 64-bit formats too.
> 
> > But we've been seeing some results which point that the whole color
> > buffer swizzling idea might be overrated: it increases memory bandwidth
> > usage substantially,
> 
> Why?
> It should decrease it due to a lower number of cache misses, thanks to
> having a 2D instead of a 1D neighborhood of a pixel in the cache.

We're talking about rendertarget swizzling, and not texture swizzling
(llvmpipe doesn't do texture swizzling yet -- all textures must be
linear). 

Operating in tiles already lowers the caches when writing/blending to
the rendertarget. Having those tiles swizzled internally only makes
computing derivatives and SoA computations easier. Nothing more.  

> The only increase is that due to unswizzling for transfer or
> presentation, but that should be much lower that the rendering load on
> complex applications.

It can surely be application dependent -- we haven't done enough
prototyping and testing to be sure.

It is also somewhat unintuitive -- a lot of what I'm writing here is a
rationalization of data points we gathered (i.e, an interpretation), and
not the final answers.

I suspect one thing that doesn't help currently is that the tiles are
too big to fit on L1 cache. It's possible that by doing smaller tiles
the swizzled+unswizzled versions fit, and there is less trashing, making
render-to-swizzled + unswizzle as competing as render-to-unswizzled.

> > and for many shaders where derivatives aren't
> > necessary or can be determined from inputs there's no need to process
> > 2x2 quads of pixels or even computations in SoA. That is, we might end
> > up eliminating swizzling from llvmpipe altogether in the medium term.
> 
> Even with AoS, there is still the advantage in cache locality of an
> AoS-but-swizzled layout.
> The increased cost in address computation may well be much worse that
> the cache benefits though, unless some smart way of doing it can be
> found.

It still sounds that you're referring to sampling from a texture and not
rendering/blending to it. Of course the are related (we only want one
swizzled layout at most), but the current swizzled layout was chosen to
make blending easy; and not to make texture sampling easy. 

No SoA swizzled layout makes texture sampling easier or more efficient.

> I think vertex shaders might be the first place to look at for
> switching to AoS, since they don't have derivatives, and the input is
> much harder to keep in SoA form.
> 
> Were LLVM vertex shaders done as SoA just because the existing
> llvmpipe code did that, or because it's actually better?

The former: because the TGSI -> SoA code was there and already robust.

> > So instead of going through a lot of work to support multiple swizzled
> > types I'd prefer to keep the current simplistic (always 8bit unorm)
> > swizzled type, and simply ignore errors in the clamping/precision loss
> > when rendering to formats with higher precision dynamic range.
> >
> > In summary, apart of your fragment clamping changes, I'd prefer to keep
> > the rest of llvmpipe unchanged (and innacurate) for the time being.
> 
> Note that this totally destroys the ability to use llvmpipe for high
> dynamic range rendering, which fundamentally depends on the ability to
> store unclamped and relatively high precision values.
> 
> Using llvmpipe is important because softpipe can be so slow on
> advanced applications (e.g. the Unigine demos) to be unusable even for
> testing.

I'm not giving a death sentence -- simply prioritization. I'm hopeful
we'll get there not too long. 

> Perhaps a middle ground could be to have the option to choose the
> swizzled tile format at compilation time.
> Applying or not my patch will have this effect, but requires it to be
> maintained, and it will obviously conflict a lot with other changes.
> 
> Also some of the work will be required anyway to support rendering
> directly to textures of any format if one goes that route.

More than floating point issue per se, I'm actually more concerned with
keeping this part of the llvmpipe simple to allow experimentation and
rethink the role of swizzled formats.

Jose