[Mesa-dev] [PATCH] r600g: remove one pointless flush

Sat Oct 29 00:47:05 PDT 2011

On Sat, 2011-10-29 at 02:32 +0200, Marek Olšák wrote:
> On Sat, Oct 29, 2011 at 12:58 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> > On Sat, 2011-10-29 at 01:29 +0400, Vadim Girlin wrote:
> >> On Fri, 2011-10-28 at 23:16 +0200, Marek Olšák wrote:
> >> > On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> >> > > On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
> >> > >> I've got no idea what the flush was good for, but it's useless from
> >> > >> the looks of it. The rest of the patch is just a cleanup resulting
> >> > >> from some of the variables being no longer used for anything useful.
> >> > >>
> >> > >> There are no piglit regressions.
> >> > >
> >> > > It was intended to handle multiple interleaved query and conditional
> >> > > render calls with single query object (in this case in theory we may
> >> > > have multiple outstanding queries in current CS and separate data block
> >> > > in the buffer for each query, with possible buffer overflow). I wasn't
> >> >
> >> > Do you mean this?
> >> >
> >> > while (1) {
> >> >   begin_query(q);
> >> >   draw();
> >> >   end_query(q);
> >> >   render_condition(q);
> >> >   draw();
> >> >   render_condition(NULL);
> >> > }
> >> >
> >> > begin_query always resets query results to 0 anyway, so in theory, we
> >> > could re-use the same data block over and over again.
> >>
> >> I think it's possible to run this loop without flushes, so we'll have
> >> multiple queries queued in current cs. From the driver point of view
> >> these queries will be executed simultaneously after flush, that's why we
> >> need to reserve and initialize separate data blocks for them.
> >>
> >
> > Probably we also need to use PIPE_TRANSFER_UNSYCHRONIZED to avoid the
> > flush when mapping the buffer for data block initialization in the
> > r600_query_begin. It seems I missed this, or the mapping semantics were
> > changed with winsys change.
> 
> Ah I see. I entirely missed the map/unmap part. Of course there is an
> implicit sync in the winsys (the flush is not so expensive, but the
> sync is). UNSYCHRONIZED would be dangerous in this particular case,
> because the GPU may still use the previous buffer data. UNSYCHRONIZED
> can only be used if we're 100% sure the GPU doesn't use the buffer
> range we're going to change. I see only two ways out of this:
> 
> 1) If the buffer is full, we can allocate another one and use that. I
> don't think we have any other choice with current Mesa master.
> 
> 2) We can program the GPU to memset the buffer. This would be very
> easy with transform feedback.
> 
> I prefer (2).
> 

(2) looks interesting, though currently I know next to nothing about
transform feedback on radeon hw.

I also have some ideas about possible improvements for this code, but I
never found the time to try them, and probably it will require more work
than your way. I was thinking about the following:

1) We can pre-initialize all data blocks in the buffer at once, e.g.
with memcpy, to avoid map/init/unmap overhead for every data block. 

2) We can use common data buffer(s) for all occlusion queries to reduce
allocation overhead and to avoid unnecessary initialization of multiple
data blocks for one-shot queries. Common buffer will be used
sequentially, similar to existing code.

3) We can use two big enough common buffers in turn, to reduce/avoid
sync overhead. With high probability (depending on the size) one buffer
will be completely processed by the GPU while we are emitting queries
with another. When current buffer is over, we'll initialize and use
another one. Probably having e.g. 32Kb buffers should be enough to avoid
waiting for GPU completely in most cases.

4) We can offload next buffer initialization to another thread.

5) We can adjust the size/number of the buffers dynamically, depending
on the usage.

Anyway, I'm OK with your solution.

Vadim