<div dir="ltr">On 17 September 2013 05:13, Rogovin, Kevin <span dir="ltr"><<a href="mailto:kevin.rogovin@intel.com" target="_blank">kevin.rogovin@intel.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br> <br> Thank you for the very fast answers, some more questions:<br> <div class="im"><br> <br> > It's not a preference question. The registers are 8 floats wide.<br> > Vertex shaders get invoked 2 vertices at a time, with a register containing these values:<br> ><br> > . +------+------+------+------+------+------+------+------+<br> > . | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |<br> > . +------+------+------+------+------+------+------+------+<br> <br> </div>This seems best to me: run two vertices in each invocation with the hopes that the<br> shader compiler will merge (multiple) float, vec2 and maybe even vec3 operations into<br> vec4 operations (does it)? <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div class="im"><br> <br> > while these 8 pixels in screen space:<br> ><br> > . +----+----+----+----+<br> > . | p0 | p1 | p2 | p3 |<br> > . +----+----+----+----+<br> > . | p4 | p5 | p6 | p7 |<br> > . +----+----+----+----+<br> ><br> > are loaded in fragment shader registers as:<br> ><br> > . +------+------+------+------+------+------+------+------+<br> >. | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |<br> > . +------+------+------+------+------+------+------+------+<br> ><br> > Note how one register just holds a single channel ('.x' here) of a vector. A vec4 would take up 4 registers, and to do value0.xyzw * value1.xyzw, you'd emit 4 MULs.<br> <br> </div>This is exactly what I was trying to ask/say about the fragment shader running, i.e. n-fragments are processed with 1 n-SIMD command (for i965, n=8),<br> sighs my e-mail communications leave something to be desired.<br> Some questions:<br> 1) do the fragments need to be in a 4x2 block, or can it be two separate 2x2 blocks? <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> 2) for tiny triangles for fragment shaders that do not require dFdx, dFdy or fwidth, can the fragments be totally scattered? <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Along further lines, for non-dependent texture lookups, are there code lines where the derivatives are computed<br> analytically so that selecting the correct LOD does not require to process fragments in 2x2 (or larger) blocks? Or does<br> the i965 hardware sampler interface does not allow this kind of madness?<br></blockquote><div><br></div><div>We don't do any such optimizations in the Mesa/i965 driver, and I suspect it wouldn't help much if we did (the sampler hardware computes the gradients from the input coordinates by taking advantage of the 2x2 block arrangement, so the gradient computation is extremely cheap).<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div class="im"><br> >> On a related note, where are the beans about the dispatch table?<br> >I don't know this one (or particularly what you're asking, I guess).<br> <br> </div>Viewing docs/index.html, on the side panel "Developer Topics --> GL Dispatch" there is text (broken into sections "1. Complexity of GL Dispatch", "2. Overview of Mesa's Implementation" and "3. Optimizations " describing how different GL contexts for the same hardware can do different things for the same GL function and that mesa has stubs which in turn call the "real" function. The documents go on to talk about various ways the function tables are filled and accessed across separate threads. My questions are:<br> 0) is that information text still accurate? In particular, the directory src/glapi is gone from Mesa (atleast what I git cloned) and I thought that was the location of it.<br> 1) where/how does the i965 driver fill that table, if it exists?<br></blockquote><div><br></div><div>Some of this documentation may be out of date--we often forget that it exists, so we don't keep it very well updated. If you find specific errors, please feel free to submit patches to fix them.<br> <br></div><div>The directory src/glapi is now src/mapi/glapi.<br><br></div><div>A lot of the code to fill in the dispatch table is in src/mesa/main/api_exec.c, which is generated at compile time by src/mapi/glapi/gen/gl_genexec.py from the .xml files in the src/mapi/glapi/gen directory. A handful of dispatch table functions aren't populated by api_exec.c because they change dynamically depending on GL state. Functions that specify vertex attributes (e.g. glColor4f()) are set up by install_vtxfmt() in src/mesa/main/vtxfmt.c. Functions whose behaviour needs to be saved in exec lists are set up by _mesa_initialize_save_table() in src/mesa/main/dlist.c.<br> <br></div><div>If you're new to Mesa, I'd recommend shying away from this dispatch code for now, since it's fairly subtle and most people don't need to understand it in order to contribute usefully to Mesa. The rule of thumb is, if you're looking for the implementation of the function glFoo(), grep the source code for a function called _mesa_Foo(). If you find it, that's the function you're looking for. If you don't, then it's probably one of the functions whose behaviour changes based on GL state, in which case congratulations, you're one of the few people who actually need to understand the dispatch code :)<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Along similar lines, I see that some of the code in src/mesa/main performs various checks of various API calls and at times has some conditions dependent on what context type it is, which kind of contradicts the idea of different context have different dispatch tables [sort of, since the functions might just be the driver magick, where as the stub is validate and then call driver magick].<br> </blockquote><div><br></div><div>When a function is available in some APIs and not available in others, we handle that at the time we populate the dispatch table (for example, the code-generated api_exec.c only populates the glAlphaFuncx() function when the API is GLES 1.x). When a function is available in multiple APIs but has subtle behavioural differences from one API to the next, we handle that by checking the API in the implementation function (for example, GLES versions prior to 3.0 require that the "transpose" argument to glUniformMatrix() if false, so we check this in _mesa_uniform_matrix(), which is the common function called by all of the glUniformMatrix*() commands).<br> </div></div></div></div>