[Freedreno] [Mesa-dev] tilers and out-of-order rendering..

Fri May 20 14:51:25 UTC 2016

On Fri, May 20, 2016 at 3:35 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
> On 20/05/16 00:34, Rob Clark wrote:
>>
>> On Thu, May 19, 2016 at 6:21 PM, Eric Anholt <eric at anholt.net> wrote:
>>>
>>> Rob Clark <robdclark at gmail.com> writes:
>>>
>>>> So some rendering patterns that I've seen in apps turn out to be
>>>> somewhat evil for tiling gpu's.. couple cases I've seen:
>>>>
>>>> 1) stk has some silliness where it binds an fbo, clears, binds other
>>>> fbo clears, binds previous fbo and draws, and so on.  This one is
>>>> probably not too hard to just fix in stk.
>>>>
>>>> 2) I've seen a render pattern in manhattan where app does a bunch of
>>>> texture uploads mid-frame via a pbo (and then generates mipmap levels
>>>> for the updated texture, which hits the blit path which changes fb
>>>> state and forces a flush).  This one probably not something that can
>>>> be fixed in the app ;-)
>>>>
>>>> There are probably other cases where this comes up which I haven't
>>>> noticed yet.  I'm not entirely sure how common the pattern that I see
>>>> in manhattan is.
>>>>
>>>> At one point, Eric Anholt mentioned the idea of tracking rendering
>>>> cmdstream per render-target, as well as dependency information between
>>>> these different sets of cmdstream (if you render to one fbo, then turn
>>>> around and sample from it, the rendering needs to happen before the
>>>> sampling).  I've been thinking a bit about how this would actually
>>>> work, and trying to do some experiments to get an idea about how
>>>> useful this would be.
>>>
>>>
>>> My plan was pretty much what you laid out here, except I was going to
>>> just map to my CL struct with a little hash table from the FB state
>>> members since FB state isn't a CSO.
>>
>>
>> ok, yeah, I guess that solves the naming conflict (fd_batch(_state)
>> sounds nicer for what it's purpose really is than
>> fd_framebuffer_state)
>>
>> BR,
>> -R
>
>
> llvmpipe is also a tiler and we've seen similar patterns.  Flushing reduces
> caching effectiveness, however in llvmpipe quite often texture sampling is
> the bottleneck, and an additional flush doesn't make a huge difference.
>

interesting, it hadn't occurred to me about llvmpipe

>
> I think the internal hash table as Eric proposes seems a better first step.
>
> Later on we could try make framebuffer state a first class cso, but I
> suspect you'll probably want to walk internally all pending FBOs CLs anyway
> (to see which need to be flushed on transfers.)
>
> So first changing the driver internals, then abstract if there are
> commonalities, seems more effective way forward.

yeah, makes sense.. and I'm planning to go w/ Eric's idea to keep
fd_batch separate from framebuffer state.

It did occur to me that I forgot to think about the write-after-read
hazard case.  Those need to be handled with an extra dependency
between batches too.

And at least for this particular case, I need somehow some cleverness
to discard or clone the old bo to avoid that write-after-read forcing
a flush.  (Maybe in transfer_map?  But I guess there are other paths..
hmm..)

BR,
-R