[Mesa-dev] tilers and out-of-order rendering..

Rob Clark robdclark at gmail.com
Sat Jun 4 19:36:48 UTC 2016


On Fri, Jun 3, 2016 at 8:53 AM, Rob Clark <robdclark at gmail.com> wrote:
> Ok, so I had a really evil thought that I wanted to bounce off
> people..  it's a quite different approach from the more obvious one
> discussed below (and which I've already started implementing)
>
> Basically, idea is to have a wrapper pipe driver, similar to
> ddebug/rbug/trace/etc, which re-orders draw calls.  All the CSO
> objects would have to be wrapped in a refcounted thing, so
> pending-draw's could hang on to their associated state.  For things
> that are not refcounted (draw_info, and all the non-CSO state) there
> would unfortunately be some memcpy involved.. not sure how bad that
> would be, but it seems like the thing that could doom the idea?

so the slightly awkward thing is how to deal with things like
u_blitter (pipe->blit/pipe->copy_region).. if we were re-ordering
things to avoid unnecessary render target switches, the wrapper layer
would have to handle these paths itself.  But looks like vc4 has some
special handling (vc4_tile_blit()).. not really sure how that would
work out.

(and in general, the wrapper layer would want to handle some cases, as
well as transfer_map, itself.. so it could generate ghost
pipe_resources for things like writing into a busy texture.. but that
probably isn't too hard since a wrapper pipe_resource could replace
the ref to hw driver's pipe_resource and schedule blits to copy from
previous pipe_resource where needed..  hopefully combination of
PIPE_TRANSFER_DISCARD* and pipe_draw_info::discard type hint (as I
mentioned below) could "DCE" those copy blits.  Except I somehow need
to deal w/ CSO's which have reference to the ghosted resource.. bleh)

BR,
-R

> The nice thing is it becomes basically free to turn on/off for
> different drivers, at least at screen create time.. basically it gets
> 100% re-use, rather than having to re-implement the concepts in each
> (tiler) driver.
>
> Not sure if we need a way to turn it on/off at context create time,
> but either way it would be nice if it were somehow a driconf option so
> that it could be enabled/disabled per app, as to not penalize properly
> written apps.
>
> Thoughts?
>
> ----
>
> Semi-related issue, which applies to either of the draw-reordering
> approaches.  A frequent pattern is:
>
>    ... bunch of draws ...
>    glTexSubImage2D()
>    glGenerateMipmap()
>    ... bunch more draws ...
>    ... repeat sequence a bunch of times with same texture ...
>
> That glTexSubImage() comes to driver as transfer_map(DISCARD_RANGE).
> At this point the backing bo is likely to be busy (since above
> sequence repeats a bunch of times with the same texture).  So the best
> we can do is discard whole bo and schedule blit(s) for the remaining
> levels into the new bo.
>
> But then at the glGenerateMipmap() step, we overwrite the contents of
> all the other layers.  Which means if driver (or re-ordering wrapper
> layer) had some extra hints, the blits triggered by the transfer_map()
> could be skipped.
>
> What I'm thinking would be a simple solution is to have an extra field
> in pipe_draw_info so that internal blits (like mipmap generation)
> could hint to the driver that the entire previous contents of the
> render target are discarded.  (Or possibly we want it more
> fine-grained, to indicate which render-targets and z/s are discarded,
> if not all?  But thit doesn't seem useful.)  This could help tell
> tilers that they could discard previous blits (and even skip
> system-memory -> tile transfer).
>
> (Hell, there might even be some use to apps to expose the "this draw
> discards previous contents" type extension..  given some of the wonky
> vendor extensions I've seen, I wouldn't be surprised if it already
> existed.)
>
> Thoughts?
>
> BR,
> -R
>
>
> On Fri, May 20, 2016 at 10:51 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Fri, May 20, 2016 at 3:35 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
>>> On 20/05/16 00:34, Rob Clark wrote:
>>>>
>>>> On Thu, May 19, 2016 at 6:21 PM, Eric Anholt <eric at anholt.net> wrote:
>>>>>
>>>>> Rob Clark <robdclark at gmail.com> writes:
>>>>>
>>>>>> So some rendering patterns that I've seen in apps turn out to be
>>>>>> somewhat evil for tiling gpu's.. couple cases I've seen:
>>>>>>
>>>>>> 1) stk has some silliness where it binds an fbo, clears, binds other
>>>>>> fbo clears, binds previous fbo and draws, and so on.  This one is
>>>>>> probably not too hard to just fix in stk.
>>>>>>
>>>>>> 2) I've seen a render pattern in manhattan where app does a bunch of
>>>>>> texture uploads mid-frame via a pbo (and then generates mipmap levels
>>>>>> for the updated texture, which hits the blit path which changes fb
>>>>>> state and forces a flush).  This one probably not something that can
>>>>>> be fixed in the app ;-)
>>>>>>
>>>>>> There are probably other cases where this comes up which I haven't
>>>>>> noticed yet.  I'm not entirely sure how common the pattern that I see
>>>>>> in manhattan is.
>>>>>>
>>>>>> At one point, Eric Anholt mentioned the idea of tracking rendering
>>>>>> cmdstream per render-target, as well as dependency information between
>>>>>> these different sets of cmdstream (if you render to one fbo, then turn
>>>>>> around and sample from it, the rendering needs to happen before the
>>>>>> sampling).  I've been thinking a bit about how this would actually
>>>>>> work, and trying to do some experiments to get an idea about how
>>>>>> useful this would be.
>>>>>
>>>>>
>>>>> My plan was pretty much what you laid out here, except I was going to
>>>>> just map to my CL struct with a little hash table from the FB state
>>>>> members since FB state isn't a CSO.
>>>>
>>>>
>>>> ok, yeah, I guess that solves the naming conflict (fd_batch(_state)
>>>> sounds nicer for what it's purpose really is than
>>>> fd_framebuffer_state)
>>>>
>>>> BR,
>>>> -R
>>>
>>>
>>> llvmpipe is also a tiler and we've seen similar patterns.  Flushing reduces
>>> caching effectiveness, however in llvmpipe quite often texture sampling is
>>> the bottleneck, and an additional flush doesn't make a huge difference.
>>>
>>
>> interesting, it hadn't occurred to me about llvmpipe
>>
>>>
>>> I think the internal hash table as Eric proposes seems a better first step.
>>>
>>> Later on we could try make framebuffer state a first class cso, but I
>>> suspect you'll probably want to walk internally all pending FBOs CLs anyway
>>> (to see which need to be flushed on transfers.)
>>>
>>> So first changing the driver internals, then abstract if there are
>>> commonalities, seems more effective way forward.
>>
>>
>> yeah, makes sense.. and I'm planning to go w/ Eric's idea to keep
>> fd_batch separate from framebuffer state.
>>
>> It did occur to me that I forgot to think about the write-after-read
>> hazard case.  Those need to be handled with an extra dependency
>> between batches too.
>>
>> And at least for this particular case, I need somehow some cleverness
>> to discard or clone the old bo to avoid that write-after-read forcing
>> a flush.  (Maybe in transfer_map?  But I guess there are other paths..
>> hmm..)
>>
>> BR,
>> -R


More information about the mesa-dev mailing list