[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

Chema Casanova jmcasanova at igalia.com
Mon Aug 20 12:32:25 UTC 2018

El 29/07/18 a las 19:47, Chema Casanova escribió:
> El 28/07/18 a las 01:45, Francisco Jerez escribió:
>> Chema Casanova <jmcasanova at igalia.com> writes:


>>>>> If we have a partial write/read:
>>>>> I understood that you my initial patter proposal would only be ok for
>>>>> the first GRF of src[i]/dst (reg_offset == 0)
>>>>> periodic_mask(this->exec_size,                           /* count */
>>>>>        this->src[i].stride * type_sz(this->src[i].type), /*step */
>>>>>        type_sz(this->src[i].type),                       /* bits */
>>>>>        this->src[i].offset % REG_SIZE);                  /* offset */
>>>>> In the case we manage only reg_offset == 0 we get a huge improvement
>>>>> reducing all problems many of the register_pressure we have now on all
>>>>> SIMD8 shaders with 8/16bits test cases.
>>>>> I understood that you didn't agree that for cases where src/destination
>>>>> use more than 1 GRF (reg_offset == 1) we can not guarantee that we can
>>>>> apply the same internal offset (this->src[i].offset % REG_SIZE) as the
>>>>> base register to calculate a patter. So It would be better to return ~0u
>>>>> on reads or 0u in writes.
>>>> Yes, but you could easily determine whether the mask is going to be
>>>> invariant with respect to reg_offset (where reg_offset is within bounds)
>>>> and in that case return the periodic_mask() expression above, otherwise
>>>> return 0/~0u depending on whether reg_offset is within bounds.
>>> Ok, so we are within bounds, we don't have a predicated write, we are
>>> not a send message. Then we have an ALU opcode and we return the
>>> periodic_mask.
>> Those are all necessary but not sufficient conditions for the
>> periodic_mask() expression above to give you the correct answer for any
>> in-bounds reg_offset > 0, you should check that byte_offset < type_size
>> * stride in addition.
> That's true. Fixed in v5.
> If we don't satisfy the condition then we return 0 on writes and ~0u on
> reads.

Could you have a look at the v5 to check if I can count with your R-b ?


I suppose you didn't have time to have a look at the other patch of the

"[v2,2/2] intel/fs: Improve liveness range calculation for partial writes"

Thanks in advance,


More information about the mesa-dev mailing list