[Mesa-dev] a newbie asking newbie questions

Tue Sep 17 10:02:38 PDT 2013

On 17 September 2013 05:13, Rogovin, Kevin <kevin.rogovin at intel.com> wrote:

> Hello,
>
>  Thank you for the very fast answers, some more questions:
>
>
> > It's not a preference question.  The registers are 8 floats wide.
> > Vertex shaders get invoked 2 vertices at a time, with a register
> containing these values:
> >
> > .   +------+------+------+------+------+------+------+------+
> > .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
> > .   +------+------+------+------+------+------+------+------+
>
> This seems best to me: run two vertices in each invocation with the hopes
> that the
> shader compiler will merge (multiple) float, vec2 and maybe even vec3
> operations into
> vec4 operations (does it)?
>

>
> > while these 8 pixels in screen space:
> >
> > .   +----+----+----+----+
> > .   | p0 | p1 | p2 | p3 |
> > .   +----+----+----+----+
> > .   | p4 | p5 | p6 | p7 |
> > .   +----+----+----+----+
> >
> > are loaded in fragment shader registers as:
> >
> > .   +------+------+------+------+------+------+------+------+
> >.   | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |
> > .   +------+------+------+------+------+------+------+------+
> >
> > Note how one register just holds a single channel ('.x' here) of a
> vector.  A vec4 would take up 4 registers, and to do value0.xyzw *
> value1.xyzw, you'd emit 4 MULs.
>
> This is exactly what I was trying to ask/say about the fragment shader
> running, i.e. n-fragments are processed with 1 n-SIMD command (for i965,
> n=8),
> sighs my e-mail communications leave something to be desired.
> Some questions:
>  1) do the fragments need to be in a 4x2 block, or can it be two separate
> 2x2 blocks?
>
 2) for tiny triangles for fragment shaders that do not require dFdx, dFdy
> or fwidth, can the fragments be totally scattered?
>

> Along further lines, for non-dependent texture lookups, are there code
> lines where the derivatives are computed
> analytically so that selecting the correct LOD does not require to process
> fragments in 2x2 (or larger) blocks? Or does
> the i965 hardware sampler interface does not allow this kind of madness?
>

We don't do any such optimizations in the Mesa/i965 driver, and I suspect
it wouldn't help much if we did (the sampler hardware computes the
gradients from the input coordinates by taking advantage of the 2x2 block
arrangement, so the gradient computation is extremely cheap).

>
> >> On a related note, where are the beans about the dispatch table?
> >I don't know this one (or particularly what you're asking, I guess).
>
> Viewing docs/index.html, on the side panel "Developer Topics --> GL
> Dispatch" there is text (broken into sections "1. Complexity of GL
> Dispatch", "2. Overview of Mesa's Implementation"  and "3. Optimizations "
> describing how different GL contexts for the same hardware can do different
> things for the same GL function and that mesa has stubs which in turn call
> the "real"  function. The documents go on to talk about various ways the
> function tables are filled and accessed across separate threads. My
> questions are:
>  0) is that information text still accurate? In particular, the directory
> src/glapi is gone from Mesa (atleast what I git cloned) and I thought that
> was the location of it.
>  1) where/how does the i965 driver fill that table, if it exists?
>

Some of this documentation may be out of date--we often forget that it
exists, so we don't keep it very well updated.  If you find specific
errors, please feel free to submit patches to fix them.

The directory src/glapi is now src/mapi/glapi.

A lot of the code to fill in the dispatch table is in
src/mesa/main/api_exec.c, which is generated at compile time by
src/mapi/glapi/gen/gl_genexec.py from the .xml files in the
src/mapi/glapi/gen directory.  A handful of dispatch table functions aren't
populated by api_exec.c because they change dynamically depending on GL
state.  Functions that specify vertex attributes (e.g. glColor4f()) are set
up by install_vtxfmt() in src/mesa/main/vtxfmt.c.  Functions whose
behaviour needs to be saved in exec lists are set up by
_mesa_initialize_save_table() in src/mesa/main/dlist.c.

If you're new to Mesa, I'd recommend shying away from this dispatch code
for now, since it's fairly subtle and most people don't need to understand
it in order to contribute usefully to Mesa.  The rule of thumb is, if
you're looking for the implementation of the function glFoo(), grep the
source code for a function called _mesa_Foo().  If you find it, that's the
function you're looking for.  If you don't, then it's probably one of the
functions whose behaviour changes based on GL state, in which case
congratulations, you're one of the few people who actually need to
understand the dispatch code :)

>
> Along similar lines, I see that some of the code in src/mesa/main performs
> various checks of various API calls and at times has some conditions
> dependent on what context type it is, which kind of contradicts the idea of
> different context have different dispatch tables [sort of, since the
> functions might just be the driver magick, where as the stub is validate
> and then call driver magick].
>

When a function is available in some APIs and not available in others, we
handle that at the time we populate the dispatch table (for example, the
code-generated api_exec.c only populates the glAlphaFuncx() function when
the API is GLES 1.x).  When a function is available in multiple APIs but
has subtle behavioural differences from one API to the next, we handle that
by checking the API in the implementation function (for example, GLES
versions prior to 3.0 require that the "transpose" argument to
glUniformMatrix() if false, so we check this in _mesa_uniform_matrix(),
which is the common function called by all of the glUniformMatrix*()
commands).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130917/c5c00c16/attachment-0001.html>