[Mesa-dev] [RFC] i965/dbg: Expose cases hitting a presumably dead optimization

Sun Mar 13 07:58:46 UTC 2016

On Sat, Mar 12, 2016 at 08:44:54AM -0800, Jason Ekstrand wrote:
>    On Mar 11, 2016 11:47 PM, "Pohjolainen, Topi"
>    <[1]topi.pohjolainen at intel.com> wrote:
>    >
>    > On Fri, Mar 11, 2016 at 05:59:37PM -0800, Jason Ekstrand wrote:
>    > >    On Fri, Mar 11, 2016 at 4:40 AM, Topi Pohjolainen
>    > >    <[1][2]topi.pohjolainen at intel.com> wrote:
>    > >
>    > >      The logic iterates over param[] which contains pointers to
>    > >      uniform storage set during linking (see
>    > >      link_assign_uniform_locations()).
>    > >      The pointers are unique and it should be impossible to find
>    > >      matching entries.
>    > >      I couldn't find any regressions with test system. In addition
>    > >      I tried several benchmarks on HSW and none hit this.
>    > >      I'm hoping to remove this optimization attempt. This is the
>    only
>    > >      bit that depends on knowing about the actual storage during
>    > >      compilation. All the rest deal with just relative push and
>    pull
>    > >      locations once the actual filling of pull_param[] is moved
>    > >      outside the compiler just as param[]. (Filling pull_param is
>    > >      based on the pull locations and doesn't need to be inside the
>    > >      compiler).
>    > >      Any thoughts?
>    > >
>    > >    I'm not 100% sure what you're trying to do, but I have a branch
>    that
>    > >    may be of interest:
>    > >
>    [2][3]https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/i965-unif
>    orm
>    > >    s
>    > >    The branch enables support for pushing small uniform arrays.
>    Among
>    > >    other things, it redoes the way we do push constants and gets
>    rid of
>    > >    some of the data tracking in the backend compiler.  The big
>    reason why
>    > >    I haven't tried too hard to get it merged is because it
>    regresses Sandy
>    > >    Bridge just a bit.  I know I've seen and fixed the bug before in
>    an
>    > >    alternate attempt, but I don't remember how.
>    > >    I'm going to be refreshing it soon because we need indirect push
>    > >    constants for the Vulkan driver.  (The branch is already merged
>    into
>    > >    the Vulkan branch.)
>    >
>    > I'd like to stop filling param[] before compilation. This is really
>    not
>    > needed by the compiler as it deals with pull and push constant
>    locations,
>    > i.e., positions in the push and pull files. Actual uniform values and
>    their
>    > location in the uniform storage are not needed until actual pipeline
>    upload.
>    >
>    > My plan is to move the iteration over the core uniform storage to
>    pipeline
>    > upload time. We can fill push and pull buffers directly without the
>    need of
>    > storing pointers to param[] in the middle. Not only makes this things
>    simpler
>    > and more flexible in my mind, does it give us the possibility to
>    upload
>    > floats with 16-bit precision instead of 32-bits. Current upload logic
>    only
>    > gets pointers to 32-bit values without knowing if they should be
>    represented
>    > with 16 bits let alone whether the values are floats or integers to
>    begin
>    > with.
> 
>    Right. Kristian and I have talked about some related things that we
>    need for pipeline caching and the Vulkan driver.  In Vulkan, they
>    aren't actual pointers at all but are, instead, offsets into a push
>    constant block.  Fortunately, the back-end compiler never dereferences
>    them so you can shove whatever you want in there and it's OK. We've
>    talked about turning the pull and push params into just a set of
>    integers that means whatever the api and state setup code want.  One of
>    the problems with pointers is that you can't easily put them into an
>    on-disk shader cache (which we have for Vulkan).
> 
>    When you talk about 16 or 64-bit values, what is your intention?  Are
>    64-bit values still going to take up two slots or are they now one
>    64-bit slot?  Are there two 16-bit values per slot or just one?  Are
>    16-bit uniforms converted before they get uploaded or consumed directly
>    by the shader?  I'm still a little confused as to what problem you're
>    trying to solve.

I'm seeing the 16-bit float as two-fold. First, the uniform storage always
represents them as normal 32-bit floats for the gl-api to work correctly
(even if they are marked as low/mediump I don't think the api for setting and
querying them is allowed operate with reduced precision. On the other hand,
such conversion back and forth in the core gl-api doesn't sound appealing
at all just from implementation point of view).
Therefore after the compiler has chosen to represent a particular uniform
with reduced precision and set the operand types accordingly, the upload logic
has to convert the 32-bit float into equivalent 16-bit value before uplaoding.

Second is the question on how to pack the 16-bit values. I'm seeing this as
second step of the optimization where we allow more individual components in
the push file, two 16-bits components in one 32-bit slot. But as you are
saying yourself this maybe a lot of work/complexity with unknown gains in
performance.

With 64-bit doubles we don't need to anything particular as both the upload
logic and the core uniform storage agree on required space - both take two
32-bit slots.

> 
>    One thing to think about as you work on this is that Vulkan doesn't
>    have individual uniforms but instead has a block of explicit push
>    constants.  In the shader, the push constants are specified with
>    explicit offsets into that block similar to a UBO.  The result is that
>    it's very difficult for the state setup code to know what size anything
>    is.  Just chopping the push constant space into 32-bit hunks that the
>    compiler is free to rearrange is terribly convenient.  We could use 16
>    or 8-bit chunks just as easily but having some be 32-bit, others
>    64-bit, and others 16-bit has the potential to get very painful.
> 
>    Food for thought.  Maybe I'm completely missing what your trying to do.

Thanks Jason, I've been worrying if I'm conflicting with vulkan, so really
good for you to give me a heads up. I've been hoping to keep my changes in
the glsl-specific parts in the backend compiler.