[Freedreno] [Mesa-dev] tilers and out-of-order rendering..

Fri May 20 07:35:15 UTC 2016

On 20/05/16 00:34, Rob Clark wrote:
> On Thu, May 19, 2016 at 6:21 PM, Eric Anholt <eric at anholt.net> wrote:
>> Rob Clark <robdclark at gmail.com> writes:
>>
>>> So some rendering patterns that I've seen in apps turn out to be
>>> somewhat evil for tiling gpu's.. couple cases I've seen:
>>>
>>> 1) stk has some silliness where it binds an fbo, clears, binds other
>>> fbo clears, binds previous fbo and draws, and so on.  This one is
>>> probably not too hard to just fix in stk.
>>>
>>> 2) I've seen a render pattern in manhattan where app does a bunch of
>>> texture uploads mid-frame via a pbo (and then generates mipmap levels
>>> for the updated texture, which hits the blit path which changes fb
>>> state and forces a flush).  This one probably not something that can
>>> be fixed in the app ;-)
>>>
>>> There are probably other cases where this comes up which I haven't
>>> noticed yet.  I'm not entirely sure how common the pattern that I see
>>> in manhattan is.
>>>
>>> At one point, Eric Anholt mentioned the idea of tracking rendering
>>> cmdstream per render-target, as well as dependency information between
>>> these different sets of cmdstream (if you render to one fbo, then turn
>>> around and sample from it, the rendering needs to happen before the
>>> sampling).  I've been thinking a bit about how this would actually
>>> work, and trying to do some experiments to get an idea about how
>>> useful this would be.
>>
>> My plan was pretty much what you laid out here, except I was going to
>> just map to my CL struct with a little hash table from the FB state
>> members since FB state isn't a CSO.
>
> ok, yeah, I guess that solves the naming conflict (fd_batch(_state)
> sounds nicer for what it's purpose really is than
> fd_framebuffer_state)
>
> BR,
> -R

llvmpipe is also a tiler and we've seen similar patterns.  Flushing 
reduces caching effectiveness, however in llvmpipe quite often texture 
sampling is the bottleneck, and an additional flush doesn't make a huge 
difference.

I think the internal hash table as Eric proposes seems a better first step.

Later on we could try make framebuffer state a first class cso, but I 
suspect you'll probably want to walk internally all pending FBOs CLs 
anyway (to see which need to be flushed on transfers.)

So first changing the driver internals, then abstract if there are 
commonalities, seems more effective way forward.

Jose