[Mesa-dev] Batchbuffer question

Fri Nov 8 07:46:07 PST 2013

On 8 November 2013 04:49, Rogovin, Kevin <kevin.rogovin at intel.com> wrote:

> Hi all,
>
>  As I was poking into the magicks for the batchbuffer, I saw the following
> logical bits of code, that make sense by themselves but get me paranoid
> together. Firstly in intel_batchbuffer_begin() [ intel_batchbuffer.h, and
> this is what BEGIN_BATCH maps to] there is a
> intel_batchbuffer_require_space() call that if too much room is needed then
> calls intel_batchbuffer_begin():
>
> from intel_batchbuffer_require_space():
>
>   115    if (intel_batchbuffer_space(brw) < sz)
>   116       intel_batchbuffer_flush(brw);
>
> and from intel_batchbuffer_space():
>
>    80 intel_batchbuffer_space(struct brw_context *brw)
>    81 {
>    82    return (brw->batch.state_batch_offset - brw->batch.reserved_space)
>    83       - brw->batch.used*4;
>    84 }
>
>
> Now, for allocating space for state, there is brw_state_batch():
>
>
>   128    offset = ROUND_DOWN_TO(batch->state_batch_offset - size,
> alignment);
>   129
>   130    /* If allocating from the top would wrap below the batchbuffer, or
>   131     * if the batch's used space (plus the reserved pad) collides
> with our
>   132     * space, then flush and try again.
>   133     */
>   134    if (batch->state_batch_offset < size ||
>   135        offset < 4*batch->used + batch->reserved_space) {
>   136       intel_batchbuffer_flush(brw);
>   137       offset = ROUND_DOWN_TO(batch->state_batch_offset - size,
> alignment);
>   138    }
>
>
> These taken together, I interpret as meaning that state and commands try
> to be separated by atleast batch->reserved_space bytes. I guess state could
> take up more than (batch->bo->size - batch->reserved_space) from that
> second ROUND_DOWN_TO, but that would only happen right after a flush and
> any state or command afterwards would flush too.
>
> Now my questions:
>   1) it looks like the reserved space is not to be used for either state
> or commands. Is that correct? What is it used for?
>

It is used by commands that have to go at the end of every batch.  When
we're deciding whether or not to go to a new batch buffer, those commands
haven't been emitted yet, so we have to account for the amount of space
they take up when we determine whether the next command is going to fill up
the batch buffer.  See the documentation for BATCH_RESERVED in
intel_batchbuffer.h.

>
>   2) if a function first calls brw_state_batch() to place state and it
> barely fits and then calls BEGIN_BATCH that does not fit, then the command
> will refer to an offset on the previous batch buffer, that cannot be good.
> Or does this never happen for other reasons? If so what are those reasons?
>

You are correct that this cannot be good.  It probably would lead to a GPU
hang.  The reason it doesn't happen is that we have an additional mechanism
for flushing the batch at a safe time, before we have started emitting the
state for a draw call.  Near the top of brw_try_draw_prims(), we estimate
how many bytes of batch buffer space will be needed to complete the draw
call (estimated_max_prim_size).  Then we pad it by 512 bytes to be safe,
and then we call intel_batchbuffer_require_space(), which flushes the
batchbuffer if there isn't enough space.  This is safe because we haven't
started emitting state yet.  And it ensures that once we do emit state,
there will be enough batch buffer space available to contain it without
needing to flush at an inopportune time.

There's only one problem.  As we continue to add features to the driver,
the maximum possible amount of batch buffer space needed to hold draw state
increases, and we rarely remember to update estimated_max_prim_size to
reflect that.  So there's a danger of not passing a large enough number to
intel_batchbuffer_require_space(), either now or in the future.

To reduce the risk of that, we have a safety mechanism in debug builds: if
we run out of batch buffer space while emitting draw state, we will assert
rather than flush.  Grep for "no_batch_wrap" to see the code that does this.

IMHO, we could do even better than this.  What we really should do is keep
track of the amount of batch buffer space used by every draw call and
compare it to estimated_max_prim_size.  If the difference between the
actual space used and the estimate is ever less than the 512 pad bytes in a
debug build, we should assertion fail.  That way the assertion failure
would happen if we're *ever* wrong in our estimate of how much batch space
is needed (rather than only happening if being wrong causes an unsafe
flush).  Kevin, would you be interested in submitting a patch to improve
this?  It might be a nice way for you to continue building up your
familiarity with the codebase.  If not I can add it to my to-do list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131108/ccac35a6/attachment-0001.html>