[Mesa-dev] gallium: don't fallback on buffers clears for capable hw

Mon Apr 12 17:48:53 PDT 2010

On 12.04.2010 21:48, Christoph Bumiller wrote:
> On 04/12/2010 08:52 PM, Roland Scheidegger wrote:
>> On 10.04.2010 19:52, Christoph Bumiller wrote:
>>> Hi,
>>> this might not be the best implementation, but I thought I'd add a patch
>>> to the suggestion.
>>>
>>> Basically I want nv50 to do be able to do hw clears in gallium even if
>>> scissors or color masks are active, since it's nicer than drawing a quad,
>>> especially for the HiZ buffer.
>>>
>>> The state change to deactivate scissors, should the state tracker not
>>> want them to affect clear, is probably less expensive than changing
>>> shaders and whipping up a vertex buffer.
>> I think this is generally a good idea, though I've got a couple of comments.
>> I think separating depth/stencil clears is a good idea, nowadays hw
>> often really has two hw buffers for them and can clear them separately
>> even with special path.
> I thought it was the other way around ... nv50 only has a combined ZS
> buffer.
Hmm, might completely depend on hw.

>> I don't especially like the cap PIPE_CAP_OPENGL_CLEAR, neither the name
>> (is opengl really the only api which would want to do that?) nor the
> Well, I didn't see any mention of scissors or masks in D3D10s Clear*View
> functions, so this behaviour seemed OpenGL exclusive, thus the name, but
> I can't say I'm fond of it.
I wasn't really thinking about d3d here, but I don't know off-hand if
this would be useful to other state trackers.

> 
>> concept. What about having two clear functions instead, which would both
>> be mandatory for drivers but they could just use a util default one if
>> they don't want to implement for instance only the more complicated one not?
> Seems like a good idea, you'd probably just need to copy the code from
> the mesa st to get the util function. Let's have 2 clear functions then.
> 
>> Also, as Keith has mentioned, we'd probably wanted to include the color
>> buffers to clear in the call itself rather than relying on the driver to
>>  trying to figure it out from the currently enabled buffers (and for
>> complicated case maybe the scissor box too as well as write masks?)
> There's nothing to figure out. We just validate, and the enabled buffers
> will have been bound by the st, scissors will have been set.
Well, it just looks like conceptually clear is somewhat separate from
normal drawing, hence could use separate state. Maybe that's not a good
idea though.

> 
> The annoying task (though not difficult) would be *if* the clear func
> got passed some surfaces (or, even resources) to put them in the RT
> state and call clear upon them. Yes, the D3D state tracker might like
> this, for GL it's nonsense.
> 
>> And, are you sure that in your nv50 patch you actually handle all state
>> correctly? I only see scissors, not color or stencil write masks.
>>
> look here:
> +		if (nv50->blend->pipe.independent_blend_enable)
>> +			mode |= nv50->blend->pipe.rt[i].colormask << 2;

Oops right. Didn't look close enough :-).

> 
>> I think I'm not convinced though that a "clear" including scissors,
>> stencil and color write masks really is a "clear" and not effectively a
>> "draw quad" for most hardware any more. I know that at least for old
>> radeons, clears using stencil write masks would be impossible with using
>> specialized path (as that would just set the per-tile depth/stencil regs
>> to "cleared" - nowadays I think they have separate flags for stencil and
>> depth clear but I'd be surprised if you could specify which bits are
>> cleared). Old radeons didn't have special color clears so obviously
> and here:
> +	/* let CLEAR_BUFFERS honour SCISSOR_* and STENCIL_FRONT_MASK */
>> +	BEGIN_RING(chan, screen->tesla, 0x143c, 1);
>> +	OUT_RING  (chan, 0x01);
Got it :-)

> 
>> color write masks were a non-issue. But scissoring again would be a
>> problem - though presumably you could still write a optimized
>> depth/stencil clear, by using register clearing for the tiles completely
>> inside the scissor and drawing (max) 4 quads for the tiles at the tile
>> boundaries. I bet that would be faster than a simple quad draw but
>> probably too complicated to implement that you'd bother...
> Is that radeon specific stuff ? 4 quads ? Or are you imagining scissors
> cutting *out* a part of the RT instead of cutting around the box ?
That should have been rectangles, not quads...
The problem is you could only clear (or not clear) complete tiles (which
were like 8x8 pixels or so? Can't remember off-hand) hence partially
cleared tiles would still need clearing by drawing a quad.

> 
>> If nvidia hw can do this though I wonder how their fast-z clear actually
>> works, or if there's some other hardware which can really do anything
>> like that (though I'm not sure what the bits the driver is using
>> currently actually means, for all I know the NV50TCL_CLEAR_BUFFERS clear
>> could just mean "draw a quad with default clear shader"...".
>>
> The HiZ data is located in some unaccessible memory cache, they probably
> just clear the top level and then propagate when you draw.
Ok, I guess this really works different to radeons. But clear commands
are actually faster than drawing a quad right?
I think this is quite unusual that the hardware can do special clears
which still take color/stencil write masks and scissor into account.

Roland