No subject

Mon Jan 31 10:13:25 PST 2011

Gallium interface and are well-defined from what I can see. They might be
owned by the application (e.g. set via glVertexPointer etc.), therefore
using the transfer functions on user buffers is invalid per se. Moreover,
the application may change the content of user buffers any time, meaning
that drivers should convert the user buffers to real buffers in the draw_vb=
o
function, then draw vertices, and then forget the real buffers, keeping the
user buffers bound for the next draw operation. Drivers should not upload
user buffers anywhere else, because the application may change the contents
between glDraw* calls. If they are bound as vertex buffers, we don't need t=
o
know their size and sometimes we even can't (again, glVertexPointer etc.).
Instead, we can use pipe_draw_info::min_index and max_index and upload only
that range. This has proved to be a great optimization and it's how r300g
and r600g work now.

In theory, doing user buffer uploads at the state tracker side using inline
transfers might work and should remove some burden from drivers. In
practice, inline transfers may have a very large overhead compared to how
things work now. In transfer_inline_write, you usually want to map the
buffer, do a memcpy, and unmap it. The map/unmap overhead can be really
significant. There are applications that use glDrawElements to draw one
triangle at a time, and they draw hundreds of triangles with user buffers i=
n
this way (yes, applications really do this). We can't afford doing any more
work than is absolutely necessary. When you get 10000 or more draw_vbo call=
s
per second, everything matters.

Currently, the radeon drivers have one upload buffer for vertices and it
stays mapped until the command stream is flushed. When they get a user
buffer, they do one memcpy and that's all. They don't touch winsys unless
the upload buffer is full.

> And user-buffers tend not to stay user-buffers - they can be promoted to
> regular buffers behind the scenes by the driver.  Would that be
> reflected in this interface somehow?
>

I don't think it's needed. The pipe_resource fields can stay immutable and
drivers can internally replace vertex buffers with their private
pipe_resources. The state trackers don't need to know about it.

> From the point of view of recording, replaying, debugging, remoting,
> etc. at the gallium boundary, it's preferable if all actions are
> interposable - ie. all actions are mediated by a function call of some
> sort into the gallium interface.  Giving a component a direct memory
> access into buffer contents would tend to defeat that and make
> record/replay of that action difficult.
>

Indeed, record/replay would be difficult but not impossbie. FWIW I think th=
e
interface shouldn't be specifically designed for record/replay. Instead,
record/replay should be made work with whatever interface there is.

> Is it possible to get a description of what you're doing at a slightly
> higher level to try and understand if there's a solution without these
> drawbacks?
>

I am quite content with the current interface for user buffers, but it will
need a few changes for it to be more... efficient. Below is my plan for
further improving Gallium and its interactions with user buffers in general=
.

1) New vertex buffer manager

This is how I'd like to put the burden of uploading user buffers out of the
drivers. I've made a new vertex buffer manager. It can be found here:
http://cgit.freedesktop.org/~mareko/mesa/commit/?h=3Dvbuf-mgr&id=3D94a53b67=
2dd238e6a50bb6b252614dc2e9f30ddf

And the corresponding branch is here:
http://cgit.freedesktop.org/~mareko/mesa/log/?h=3Dvbuf-mgr

It's a module that drivers can use and it does 2 things:
- uploads user buffers
- takes care of converting unsupported vertex formats and unaligned vertex
layouts to supported ones (there are vertex fetcher capability bits, see
struct u_vbuf_caps)

Besides some typos in a few commits, this work is already done.

With this manager, the drivers don't have to deal with user buffers when
they are bound as vertex buffers. They only get real hardware buffers. The
drivers don't even have to deal with unsupported vertex formats. Moreover,
this code has already proven to be fast when it was maturing in r300g and i=
s
specifically optimized for low overhead. Its integration to a driver is eas=
y
and straithforward. But I need the pipe_resource::user_ptr variable to be
able to access the user buffer memory in util (efficiently).

2) Optimizing vertex array state changes in st/mesa

Because we don't need to know the sizes of user vertex buffers (as I said,
we can use pipe_draw_info::min_index and max_index instead, as is done in
the vertex buffer manager), we can remove the calculation of the buffer
sizes from st_draw_vbo. We can also remove pipe_vertex_buffer::max_index
(again, the information provided by pipe_draw_info is sufficient) and the
related code from st/mesa. Not only will this simplify st_draw_vbo, it will
allow us to bind vertex buffers and a vertex elements state only when
needed, i.e. when either the _NEW_ARRAY or _NEW_PROGRAM flag is set. It
makes this usage case a lot faster:

for (i =3D 0; i < 1000; i++) glDrawElements(...);

This work is *almost* complete and can be found here:
http://cgit.freedesktop.org/~mareko/mesa/log/?h=3Dgallium-varrays-optim

The framerate in the Torcs game goes from 14 to 20 fps with this code (in
one particular scenario I was optimizing for).

Since I no longer compute the sizes of user buffers, I have to put ~0 in th=
e
'size' parameter of user_buffer_create and I expect drivers not to use it.
r300g, r600g, nv50, nvc0, softpipe, and llvmpipe should be ready for this.
Not sure about the other drivers, but it's fixable.

--
So this all is my current plan to simplify hardware drivers a bit and add
some nice optimizations. Another option would be to move the new vertex
buffer manager or something equivalent to the state tracker and remove user
buffers from the Gallium interface, but that would be additional work with
uncertain performance implications, so I decided not to take this path (for
now).

Best regards
Marek

--001636283790385192049b28dcf7
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Mon, Jan 31, 2011 at 3:17 PM, Keith Whitwell =
<span dir=3D"ltr">&lt;<a href=3D"mailto:keithw at vmware.com">keithw at vmware.co=
m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi=
n: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-le=
ft: 1ex;">

<div class=3D"im">On Sat, 2011-01-29 at 15:12 -0800, Marek Ol=C5=A1=C3=A1k =
wrote:<br>
&gt;<br>
&gt;<br>
&gt; Hi,<br>
&gt;<br>
&gt; I am proposing to add a pointer to a user buffer in pipe_resource.<br>
&gt; There are two reasons for this:<br>
&gt;<br>
&gt; 1) I would like to have a way to query outside of a driver whether a<b=
r>
&gt; buffer is a user buffer. Simply comparing the pointer with NULL would<=
br>
&gt; do the trick.<br>
&gt;<br>
&gt; 2) I would like to efficiently obtain a pointer to a user buffer<br>
&gt; outside of a driver without going through the sequence of functions<br=
>
&gt; get_transfer, transfer_map, transfer_unmap, and transfer_destroy.<br>
&gt;<br>
&gt; This will allow to move more driver-specific code to auxiliary/util.<b=
r>
&gt;<br>
&gt;<br>
<br>
<br>
</div>Marek,<br>
<br>
I have to say my preference would have been to see user buffers fade<br>
away in favour of things like inline transfers. =C2=A0That said you&#39;re =
much<br>
more active than I am in looking at this right now, so I don&#39;t want to<=
br>
get in the way of your progress.<br>
<br>
I guess my biggest problem with user buffers is how poorly defined their<br=
>
semantics are. =C2=A0For instance, what does it really mean to get write<br=
>
transfer into a userbuffer? =C2=A0 Will you be updating the original<br>
application-owned memory?<br></blockquote><div><br>Hi Keith,<br>
<br>