[Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

Iago Toral Quiroga itoral at igalia.com
Wed Dec 9 04:15:44 PST 2015


Right now we rely on the code at the bottom of brw_set_dest to set the correct execution size for anything that does not operate on a full SIMD register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, where operands are twice as big and we see instructions with a horizontal width of 4 that still require an execution size of 8. We cannot fix this by simply checking the type of the operands involved and skip the automatic execsize adjustment when they are doubles because we can also operate on doubles as integers (for pack and unpack operations for example).

Connor and I have been discussing this and we see two options:

1) We fix all the instructions that do not operate on doubles to set the
correct execution size so they don't depend on the code in brw_set_dest any more, then remove that code from brw_set_dest so it does not break fp64. However, this involves a lot of changes throughout the driver and across all hardware platforms... it looks rather scary, so I think we probably don't want to go down this path.

2) Since we only produce double instructions with a width of 4, we can limit the change to these instructions. This means that we only automatically adjust execsize in brw_set_dest for instructions where dst.width < BRW_EXECUTE_4 and we fix driver code that emits non-double instructions with a width of 4 to set the execsize they need explicitly. For gen7/8 it seems that only a handful of changes are required for this according to piglit.

This RFC series implements 2). As mentioned, piglit seems happy with the changes, but it might be a good idea to run this through Jenkins to make sure that we are not missing anything.

My original idea was to expand the fix to cover other gens as well, but unfortunately, this would involve many more changes. For example, ILK's strip-and-fans or clipping modules are full with code that mixes width8 and width4 instructions (I have ~10 patches that fix this that involve changing the default execution size to 4 in the sf module), gen6 also needs a few changes and I would not be surprised if gen4 required more. Gen9 might require changes too. Another problem with this is that we don't have gen4/9 hardware available, so trying this would also require involvement from other people with access to this hardware.

So considering that we are not targetting fp64 support on gens < 7, at least for now, it seems that implementing 2) but special-case it for gen7 and gen8 only, leaving other generations intact is a resoanble way to go for now.

Opinions?

Iago Toral Quiroga (5):
  i965/eu: set correct execution size in brw_NOP
  i965/fs: set execution size for SEND messages in
    generate_uniform_pull_constant_load_gen7
  i965/eu: set execution size for SEND message in
    brw_send_indirect_message
  i965: set correct execsize for MOVS with a width of 4 in
    brw_find_live_channel
  i965: Skip execution size adjustment for instructions of width 4

 src/mesa/drivers/dri/i965/brw_eu_emit.c        | 21 ++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.1.4



More information about the mesa-dev mailing list