[Mesa-dev] [PATCH 05/20] i965/fs: Handle explicit flag destinations in flags_written()
Matt Turner
mattst88 at gmail.com
Mon Jul 17 17:57:03 UTC 2017
On Fri, Jul 14, 2017 at 3:23 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> Matt Turner <mattst88 at gmail.com> writes:
>
>> The implementations of the ARB_shader_group_vote intrinsics will
>> explicitly write the flag as the destination register.
>> ---
>> src/intel/compiler/brw_fs.cpp | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index 43b6e34204..97908a4563 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -890,9 +890,17 @@ fs_inst::flags_written() const
>> opcode != BRW_OPCODE_WHILE)) ||
>> opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) {
>> return flag_mask(this);
>> - } else {
>> - return 0;
>> + } else if (dst.file == ARF) {
>> + if (dst.nr == BRW_ARF_FLAG + 0 && dst.subnr == 0)
>> + return 0b0001;
>> + if (dst.nr == BRW_ARF_FLAG + 0 && dst.subnr == 1)
>> + return 0b0010;
>> + if (dst.nr == BRW_ARF_FLAG + 1 && dst.subnr == 0)
>> + return 0b0100;
>> + if (dst.nr == BRW_ARF_FLAG + 1 && dst.subnr == 1)
>> + return 0b1000;
>
> This doesn't look right to me, it assumes that flag registers are 16b
> (they're 32b), and that you can only write a single byte of the flag
> register at a time, which is hardly ever the case.
I don't think I read the implementation of flag_mask(), just the
comment, and thought that it returned a bit per flag subregister.
>
> I think you want something along the lines of:
>
> | unsigned
> | bit_mask(unsigned n)
> | {
> | return (n >= CHAR_BIT * sizeof(bit_mask(n)) ? ~0u : (1 << n) - 1);
> | }
> |
> | unsigned
> | flag_mask(const fs_reg &r, unsigned sz)
> | {
> | if (r.file == ARF) {
> | const unsigned start = (r.nr - BRW_ARF_FLAG) * 4 + r.subnr;
> | const unsigned end = start + sz;
> | return bit_mask(end) & ~bit_mask(start);
> | } else {
> | return 0;
> | }
> | }
>
> [Yes, this should give you the right result (i.e. zero) for ARF
> registers outside the range of the (flag) ARF window of the 32-bit
> bitmask.] Notice the similarity between this and the current
> flag_mask() overload. This will allow you to easily re-use the same
> logic for both flags_read() and flags_written(), you just need to pass
> 'inst->src[i], inst->size_read(i)' and 'inst->dst, inst->size_written'
> as arguments respectively, ORing the results for each source in the
> former case.
Wow, thanks. That's fantastic. I've made this change (and also changed
the authorship on 05/20 and 12/20 to you! :)
More information about the mesa-dev
mailing list