[Mesa-dev] [RFC] i965/dbg: Expose cases hitting a presumably dead optimization
Pohjolainen, Topi
topi.pohjolainen at intel.com
Sun Mar 13 07:58:46 UTC 2016
On Sat, Mar 12, 2016 at 08:44:54AM -0800, Jason Ekstrand wrote:
> On Mar 11, 2016 11:47 PM, "Pohjolainen, Topi"
> <[1]topi.pohjolainen at intel.com> wrote:
> >
> > On Fri, Mar 11, 2016 at 05:59:37PM -0800, Jason Ekstrand wrote:
> > > On Fri, Mar 11, 2016 at 4:40 AM, Topi Pohjolainen
> > > <[1][2]topi.pohjolainen at intel.com> wrote:
> > >
> > > The logic iterates over param[] which contains pointers to
> > > uniform storage set during linking (see
> > > link_assign_uniform_locations()).
> > > The pointers are unique and it should be impossible to find
> > > matching entries.
> > > I couldn't find any regressions with test system. In addition
> > > I tried several benchmarks on HSW and none hit this.
> > > I'm hoping to remove this optimization attempt. This is the
> only
> > > bit that depends on knowing about the actual storage during
> > > compilation. All the rest deal with just relative push and
> pull
> > > locations once the actual filling of pull_param[] is moved
> > > outside the compiler just as param[]. (Filling pull_param is
> > > based on the pull locations and doesn't need to be inside the
> > > compiler).
> > > Any thoughts?
> > >
> > > I'm not 100% sure what you're trying to do, but I have a branch
> that
> > > may be of interest:
> > >
> [2][3]https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/i965-unif
> orm
> > > s
> > > The branch enables support for pushing small uniform arrays.
> Among
> > > other things, it redoes the way we do push constants and gets
> rid of
> > > some of the data tracking in the backend compiler. The big
> reason why
> > > I haven't tried too hard to get it merged is because it
> regresses Sandy
> > > Bridge just a bit. I know I've seen and fixed the bug before in
> an
> > > alternate attempt, but I don't remember how.
> > > I'm going to be refreshing it soon because we need indirect push
> > > constants for the Vulkan driver. (The branch is already merged
> into
> > > the Vulkan branch.)
> >
> > I'd like to stop filling param[] before compilation. This is really
> not
> > needed by the compiler as it deals with pull and push constant
> locations,
> > i.e., positions in the push and pull files. Actual uniform values and
> their
> > location in the uniform storage are not needed until actual pipeline
> upload.
> >
> > My plan is to move the iteration over the core uniform storage to
> pipeline
> > upload time. We can fill push and pull buffers directly without the
> need of
> > storing pointers to param[] in the middle. Not only makes this things
> simpler
> > and more flexible in my mind, does it give us the possibility to
> upload
> > floats with 16-bit precision instead of 32-bits. Current upload logic
> only
> > gets pointers to 32-bit values without knowing if they should be
> represented
> > with 16 bits let alone whether the values are floats or integers to
> begin
> > with.
>
> Right. Kristian and I have talked about some related things that we
> need for pipeline caching and the Vulkan driver. In Vulkan, they
> aren't actual pointers at all but are, instead, offsets into a push
> constant block. Fortunately, the back-end compiler never dereferences
> them so you can shove whatever you want in there and it's OK. We've
> talked about turning the pull and push params into just a set of
> integers that means whatever the api and state setup code want. One of
> the problems with pointers is that you can't easily put them into an
> on-disk shader cache (which we have for Vulkan).
>
> When you talk about 16 or 64-bit values, what is your intention? Are
> 64-bit values still going to take up two slots or are they now one
> 64-bit slot? Are there two 16-bit values per slot or just one? Are
> 16-bit uniforms converted before they get uploaded or consumed directly
> by the shader? I'm still a little confused as to what problem you're
> trying to solve.
I'm seeing the 16-bit float as two-fold. First, the uniform storage always
represents them as normal 32-bit floats for the gl-api to work correctly
(even if they are marked as low/mediump I don't think the api for setting and
querying them is allowed operate with reduced precision. On the other hand,
such conversion back and forth in the core gl-api doesn't sound appealing
at all just from implementation point of view).
Therefore after the compiler has chosen to represent a particular uniform
with reduced precision and set the operand types accordingly, the upload logic
has to convert the 32-bit float into equivalent 16-bit value before uplaoding.
Second is the question on how to pack the 16-bit values. I'm seeing this as
second step of the optimization where we allow more individual components in
the push file, two 16-bits components in one 32-bit slot. But as you are
saying yourself this maybe a lot of work/complexity with unknown gains in
performance.
With 64-bit doubles we don't need to anything particular as both the upload
logic and the core uniform storage agree on required space - both take two
32-bit slots.
>
> One thing to think about as you work on this is that Vulkan doesn't
> have individual uniforms but instead has a block of explicit push
> constants. In the shader, the push constants are specified with
> explicit offsets into that block similar to a UBO. The result is that
> it's very difficult for the state setup code to know what size anything
> is. Just chopping the push constant space into 32-bit hunks that the
> compiler is free to rearrange is terribly convenient. We could use 16
> or 8-bit chunks just as easily but having some be 32-bit, others
> 64-bit, and others 16-bit has the potential to get very painful.
>
> Food for thought. Maybe I'm completely missing what your trying to do.
Thanks Jason, I've been worrying if I'm conflicting with vulkan, so really
good for you to give me a heads up. I've been hoping to keep my changes in
the glsl-specific parts in the backend compiler.
More information about the mesa-dev
mailing list