[Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

Sun Feb 6 12:01:01 PST 2011

Hi Keith,

1) Recreating user buffers is very expensive, even though it's only the
CALLOC overhead. Draw-call-heavy apps simply suffer hard there. It's one of
the things the gallium-varrays-optim branch tries to optimize, i.e. make the
user buffer content mutable. I can't see another way out.

2) The map/unmap overhead is partially hidden by the fact that:
- r300g doesn't unmap buffers when asked to, it defers unmapping until the
command stream is flushed. This optimization has resulted in about 70% frame
rate increase in Nexuiz. The overhead there is now mainly when locking and
unlocking a mutex and doing some checks.
- r600g keeps all buffers mapped all the time, even textures. The only
disadvantage is it consumes address space. This is a result of desperation
we have with draw-call-heavy apps. (Do you remember that I wanted to add
spinlocks? Frankly, that was another desperate move.)

But it's not enough. We must prevent any unnecessary calls to
transfer_map/unmap. If keeping the upload buffer mapped a little longer
results in 4% perfomance increase, then I want it. I have measured the real
increase from this in Torcs and it's simply worth it. The problem with
inline transfers is it's like map/unmap, so it wouldn't improve anything.

3) Not sure if you noticed, but constants are now set via user buffers as
well. IIRC, Radeon and Nouveau people welcomed this change. The thing is
every driver uses a different approach to uploading constants and all it
needs is a direct pointer to gl_program_parameter_list::ParameterValues to
do the best job. Previously, drivers stored constants in malloc'd memory,
which was basically just a temporary copy of ParameterValues. Eliminating
that copy was the main motivation for using user buffers for constants.
r300g copies the constants to the command stream directly, whereas r600g
uses u_upload_mgr, and I guess other drivers do something entirely
different. As you can see, we can't get rid of user buffers while keeping
all drivers on the fast path. But I'd be ok with a new
set_constant_buffer(data?) function which takes a pointer to constants
instead of a resource. With that, we could remove the overhead of
user_buffer_create for constants. The original set_constant_buffer function
can be reserved for ARB_uniform_buffer_object, but shouldn't ideally be used
for anything else.

I fully understand that you want a robust interface. I would totally agree
with you if I didn't spend months profiling Mesa. I'd like to have the same
except that I also want it to be performance-oriented. I am afraid it will
be very hard to have that and the robustness at the same time. I and other
driver devs really want to compete with proprietary drivers in terms of
performance.

On Tue, Feb 1, 2011 at 6:55 PM, Keith Whitwell <keithw at vmware.com> wrote:

> So the optimization we're really talking about here is saving the
> map/unmap overhead on the upload buffer?
>
> And if the state tracker could do the uploads without incurring the
> map/unmap overhead, would that be sufficient for you to feel comfortable
> moving this functionality up a level?
>

Because one of the keys to performance is to do as little CPU work as
possible, I'd like the upload buffer to stay mapped as long as possible and
I'd like it to be used for drawing when mapped. This is OK for Radeons,
because the GPU can read some part of the buffer while some other part is
being filled by the CPU. However, it wouldn't change the situation with
regard to recording and replaying at all. This is one of the reasons I'd
like user buffers to stay and I'd like them mutable at least for vertices.
The eventual record/replay module could use the information provided by
pipe_draw_info::min_index and max_index to know what regions of user buffers
may have been changed.

But we still must be calculating that somewhere -- the min_/max_index
> info has to come from somewhere in the statetracker.
>

This info can be obtained from pipe_draw_info and that's sufficient. There
is no reason to set pipe_vertex_buffer::max_index nor pipe_resource::width0
for user buffers. Neither r300g nor r600g uses this additional information.
Computing pipe_vertex_buffer::max_index and recreating user buffers only to
set width0, which I don't need for anything, significantly reduces
performance in draw-call-heavy apps. For example, if I dropped this
computation, there would be 40% performance increase with r300g in the Torcs
racing game in the Forza track (I've measured it). Even
seemingly-performance-unrelated code can have a huge impact. (my experience
tells me I should replace "even" with "all")

No matter how I look at this whole issue, I can't see user buffers going
away without losses. I may try to move this functinality up one level in a
branch, minimize the losses as much as possible (possibly keeping some
buffers persistently mapped at the driver level, not at the state tracker
level), see how it performs, and then decide what to do next. But I can't
say now how fast it will be.

Best regards
Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20110206/263909d9/attachment.htm>