[Mesa-dev] Batch buffer sizes, flushing questions

Thu Oct 31 13:31:21 CET 2013

On Thursday, October 31, 2013 09:22:23 AM Rogovin, Kevin wrote:

> but I do not quite follow the second upload; what
> is the magicks going on with batch->state_batch_offset and for that matter
> batch->bo->size ??

This is stack and heap model for batchbuffer submission. Direct state, which 
is usually composed of the commands, is allocated at the beginning and 
indirect (dynamic) state data is allocated from the end of the batchbuffer. 
The batch->state_batch_offset is the start location of the indirect state data

-Abdiel

> 
> Going further down, I see that if the command is a blit it uses a different
> execution DRM command. I have not been able to find a reference of what
> each different DRM command does, the best I have found so far are:
> http://lwn.net/Articles/283798/ [Keith Packard's Article/Thread on LWN] 
> and https://www.kernel.org/doc/htmldocs/drm/ ; when I start to dig into the
> source code of DRM for what those functions do, I find they are set as
> function pointers and the chase eventually leads me to some ioctl like
> calls, but I still do not know what they do and the differences. Is there a
> reference or doc saying what these functions are expected to do?
> > nr_prims is sometimes != 1 when the client is using the legacy
> > glBegin()/glEnd() technique to emit primitives.  I don't recall the exact
> > circumstances that cause it to happen, but here's one example:
> > 
> > glBegin(GL_LINE_STRIP);
> > glArrayElement(...);
> > ...
> > glEnd();
> > glBegin(GL_LINE_LOOP);
> > glArrayElement(...);
> > ...
> > glEnd();
> 
> That PITA old school begin/end. If the context is core profile, does that
> then imply nr_prims is always 1?
> > Not that I'm aware of.  My intuition is that since GL apps typically do a
> > very large number of small-ish draw calls, this wouldn't be beneficial
> > most of the time, and it would be tricky to tune the heuristics to make
> > it effective in the rare circumstances where it mattered without
> > sacrificing performance elsewhere.
> By small-ish calls, do you mean the batch buffer is small or the vertex or
> fragment load is small? Generally speaking, developers are supposed to keep
> the number of glDrawFoo() calls under 1000 per frame; on embedded they are
> in for a world of hurt if they go over 500 usually, and very often over 300
> ends up being CPU limited on many embedded platforms. The calls that I am
> thinking that are "heavy"-ish are instanced calls where there are a large
> number of instances of non-trivial geometry, the most typical example is a
> field of grass.
> > drm_intel_bo_busy() will tell if a buffer object is still being used by
> > the GPU.  Also, calling drm_intel_bo_map() on a buffer will cause the CPU
> > to wait until the GPU is done with the buffer.  (In the rare cases where
> > we want to map a buffer object without waiting for the GPU we use
> > drm_intel_gem_bo_map_unsynchronized()).
> Just to check: are then GL buffer objects and texture surfaces implemented
> as DRM BO's? [Looking at the various functions specified in
> intelInitTextureSubImageFuncs,  intelInitTextureImageFuncs and
> intelInitBufferObjectFuncs makes me guess so, but it still is just a
> guess].
> 
> Looking at intel_bufferobj_subdata(), why does the change of buffer object
> data that is not used only happen async when brw_context::has_llc true?
> Also why is preferring to stall more likely to hit that path than the
> delayed data blit?
> 
> Best Regards,
> -Kevin
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev