<div dir="ltr">On 15 January 2014 14:01, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> writes:<br>
<br>
> On 2 December 2013 11:31, Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>> wrote:<br>
><br>
>> Until now it was only being taken into account in the VEC4 back-end<br>
>> but not in the FS back-end. Do it in both cases.<br>
>> ---<br>
>> src/mesa/drivers/dri/i965/brw_fs.h | 2 +-<br>
>> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++----<br>
>> src/mesa/drivers/dri/i965/brw_shader.h | 7 ++++---<br>
>> 3 files changed, 11 insertions(+), 8 deletions(-)<br>
>><br>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h<br>
>> b/src/mesa/drivers/dri/i965/brw_fs.h<br>
>> index 2c36d9f..f918f7e 100644<br>
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h<br>
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h<br>
>> @@ -615,4 +615,4 @@ bool brw_do_channel_expressions(struct exec_list<br>
>> *instructions);<br>
>> bool brw_do_vector_splitting(struct exec_list *instructions);<br>
>> bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program<br>
>> *prog);<br>
>><br>
>> -struct brw_reg brw_reg_from_fs_reg(fs_reg *reg);<br>
>> +struct brw_reg brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width);<br>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br>
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br>
>> index 8d310a1..1de59eb 100644<br>
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br>
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br>
>> @@ -981,8 +981,9 @@ static uint32_t brw_file_from_reg(fs_reg *reg)<br>
>> }<br>
>><br>
>> struct brw_reg<br>
>> -brw_reg_from_fs_reg(fs_reg *reg)<br>
>> +brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width)<br>
>> {<br>
>> + const int reg_size = 4 * dispatch_width;<br>
>><br>
><br>
> What happens when reg.type is UW and dispatch_width is 16? In that case,<br>
> we would compute reg_size == 64, but the correct value seems like it's<br>
> actually 32 in this case.<br>
><br>
> Are we perhaps relying on reg.type being a 32-bit type? If so, maybe we<br>
> should add an assertion:<br>
><br>
> assert(type_sz(reg.type) == 4);<br>
><br>
<br>
</div></div>Nope, "reg_size" is supposed to be the size in bytes of a ::reg_offset<br>
unit, i.e. one hardware register in SIMD8 mode and two hardware<br>
registers in SIMD16 mode as the comment at the definition of<br>
::reg_offset explains. The fixed factor of four is intentional and<br>
correct no matter what the register type is.<br>
<br>
Thanks.<br></blockquote><div><br></div><div>Ok, I see. It appears that both this function *and* the comment above reg_offset are assuming that the data type is 32 bit. The comment above reg_offset says:<br><br> * For pre-register-allocation GRFs and MRFs, this is in units of a<br>
* float per pixel (1 hardware register for SIMD8 or SIMD4x2 mode,<br> * or 2 registers for SIMD16 mode). For uniforms, this is in units<br> * of 1 float.<br><br></div><div>but when we get around to adding support for double-precision floats (a feature of GL 4.0), this will no longer work; for double precision types we'll need reg_offset to be measured in units of at least 4 hardware registers in SIMD16 mode to avoid overlap.<br>
<br></div><div>Similarly, for types that are 16 bits, if we consider reg_offset to be measured in units of 2 hardware registers in SIMD16 mode, we're actually wasting registers, since all 16 values actually fit in a single hardware register. That's not really a big deal right now, since we use 16-bit types so rarely in the FS back-end. In fact, I bet we never use a nonzero reg_offset on a 16-bit type.<br>
<br></div><div>It still seems to me that it would be worth putting an assertion here to help alert us to what needs to change when we add double precision support (or if someday we have hardware that supports half float computation). I'm not 100% sure what the assertion could be. "assert(type_sz(reg->type) == 4);" was an optimistic guess. We might have to do "assert(type_sz(reg->type) == 4 || reg->reg_offset == 0);" in order to avoid tripping on the rare cases where we currently use 16-bit types in fragment shaders.<br>
</div></div></div></div>