[Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads
Pohjolainen, Topi
topi.pohjolainen at intel.com
Mon Jun 22 03:35:59 PDT 2015
On Mon, Jun 22, 2015 at 01:28:12PM +0300, Pohjolainen, Topi wrote:
> On Thu, Jun 04, 2015 at 05:35:11PM -0700, Ben Widawsky wrote:
> > On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote:
> > > On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote:
> > > > Shader programs are cached per stage (FS, VS, GS) using the
> > > > corresponding shader source identifier and compile time choices
> > > > as key. However, one not only stores the program binary but
> > > > a pair consisting of program binary and program data. The latter
> > > > represents the store of constants (such as uniforms) used by
> > > > the program.
> > > >
> > > > However, when programs are searched in the cache for reloading
> > > > only the program key representing the binary is considered
> > > > (see for example, brw_upload_wm_prog() and brw_search_cache()).
> > > > Hence, when programs are re-loaded from cache the first program
> > > > binary, program data pair is extracted without considering if
> > > > the program data matches the currently in use uniform storage
> > > > as well.
> > > >
> > > > My reasoning Why this actually works is because the key
> > > > contains the identifier of the corresponding gl_program that
> > > > represents the source code for the shader program. Hence,
> > > > two programs having identical source code still have unique
> > > > keys.
> > > > And therefore brw_try_upload_using_copy() never encounters
> > > > a case where a matching binary is found but the program data
> > > > doesn't match.
> > >
> > > In fact, thinking some more I think this is possible when the
> > > same, say fragment shader, is used with two different vertex
> > > shaders. This results into there being matching binaries but
> > > program data pointing to different storage. Looking at
> > > brw_upload_cache() I still can't see how failing
> > > brw_try_upload_using_copy() makes a difference. We only upload
> > > the program binary again (even though that is the part that
> > > actually matches). And then proceed the same way regardless
> > > of the result of brw_try_upload_using_copy(). The program data
> > > gets augmented with the key.
> > >
> > > But the point remains that when a program is reloaded through
> > > the brw_search_cache() only the key (and not the program data)
> > > is considered returning the first matching pair.
> > >
> > > I probably need to write a piglit test for this.
> > >
> > > >
> > > > My ultimate goal is to stop storing pointers to the individual
> > > > components of a uniform but to store only a pointer to the
> > > > "struct gl_uniform_storage" instead, and allow
> > > > gen6_upload_push_constants() to iterate over individual
> > > > components and array elements. This is needed to be able to
> > > > convert 32-bits floats to fp16 - otherwise there is only
> > > > pointer to 32-bits without knowing its type (int, float, etc)
> > > > let alone its target precision.
> > > >
> > > > No regression in jenkins. However, we talked about this with
> > > > Ken and this doesn't really tell much as piglit doesn't really
> > > > re-use shader sources during one execution.
> > > >
> > > > Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > > CC: Kenneth Graunke <kenneth at whitecape.org>
> > > > CC: Tapani P\344lli <tapani.palli at intel.com>
> > > > ---
> > > > src/mesa/drivers/dri/i965/brw_program.c | 6 ------
> > > > 1 file changed, 6 deletions(-)
> > > >
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> > > > index e5c0d3c..7f5fde8 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > > > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct brw_stage_prog_data *a,
> > > > if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
> > > > return false;
> > > >
> > > > - if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
> > > > - return false;
> > > > -
> > > > - if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void *)))
> > > > - return false;
> > > > -
> > > > return true;
> > > > }
> > > >
> >
> > I am looking at a lot of this code for the first time, and I have a kind of wild
> > guess.
> >
> > The first time you upload a program, the program (kinda annoying that
> > brw_upload_item_data doesn't seem to actually do that). Malloc a pointer (tmp,
> > item->key), store the program and aux there. Set that pointer as the key.
> >
> > The aux data lives at key + key_size.
> >
> > Indeed search_cache() only checks the key. For WM it does contain the
> > urb_entry data that I think would change if number of uniforms differed. So for
> > your example above with 2 VS sharing an FS, if the number of uniforms are the
> > same, then the program should be identical in the FS, right? Similarly for the
> > GS with input_varyings. I think generally this is the behavior you'd want.
> >
> > brw_try_upload_using_copy() seems correct to me as it does do the aux_compare
> > (and falls back to memcmp).
>
> Well, I've been looking this quite a bit now, and I'm still somewhat confused
> what brw_upload_cache() tries to achieve with brw_try_upload_using_copy().
>
> If you check brw_try_upload_using_copy() you see that it does not consider
> the program key. And recall that the key represents the source code of the
> program via the id and the choices made by the compiler based on the gl-state.
> Instead brw_try_upload_using_copy() first considers the auxiliary data. If this
> matches, then it considers the actual program binary (data). If both match
> it is happy.
> Now lets consider what brw_upload_cache() does when a match is found. It
> creates a new hash entry without considering if the given program _key_
> matches the item just found in the cache. It also re-copies the program
> binary to the found entry even though we just verified that the bytes there
> are identical to the data we are about to re-write there.
>
> Overall, it looks to me that brw_upload_cache() tries to somehow address a
> case where aux+data is identical but the key isn't. I don't really get the
> special treatment (the logic described above) nor can I understand how it is
> possible to have matching aux+data with two differing program keys. Two
> different keys could point to the same program binary but the aux-data
> would always be different as it represents the uniform value storage which
> in turn is not specific to the gl_fragment_program but to the gl_shader_program
> binding fragment and vertex programs together.
I toyed around, for example, with a piglit test re-using a shader object with
a program object. This works just fine with my patch as brw_upload_cache()
ends up creating two cache entries both pointing to the same program binary.
Without my patch we end up loading two identical copies of the program binary
to the underlying buffer object.
...
const GLuint fs = piglit_compile_shader_text(GL_FRAGMENT_SHADER,
fs_source);
const GLuint vs = piglit_compile_shader_text(GL_VERTEX_SHADER,
vs_source);
GLint color_loc_prog_1;
GLint color_loc_prog_2;
prog_1 = piglit_link_simple_program(vs, fs);
glUseProgram(prog_1);
color_loc_prog_1 = glGetUniformLocation(prog_1, "color");
glUniform4fv(color_loc_prog_1, 1, red);
prog_2 = piglit_link_simple_program(vs, fs);
glUseProgram(prog_2);
color_loc_prog_2 = glGetUniformLocation(prog_2, "color");
glUniform4fv(color_loc_prog_2, 1, green);
...
More information about the mesa-dev
mailing list