[Mesa-dev] [PATCH 09/32] i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst.
currojerez at riseup.net
Sun Feb 8 04:52:40 PST 2015
Kenneth Graunke <kenneth at whitecape.org> writes:
> On Saturday, February 07, 2015 02:10:19 AM Francisco Jerez wrote:
>> Matt Turner <mattst88 at gmail.com> writes:
>> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>> >> Scalar registers are required to have zero stride, fix the
>> >> regs_written calculation not to assume that the instruction writes
>> >> zero registers in that case.
>> >> ---
>> >> src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>> >> 1 file changed, 2 insertions(+), 1 deletion(-)
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> index 0a82fb7..bafe438 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
>> >> case HW_REG:
>> >> case MRF:
>> >> case ATTR:
>> >> - this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
>> >> + this->regs_written =
>> >> + CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
>> > I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
>> > subregister writes to be writing a register.")
>> > I don't really care which gets committed, but I did have to think a
>> > lot more about yours (and look up what CEILING did) to be sure it did
>> > the same thing. Maybe you like mine better?
>> I find the helper function easier to understand and less error-prone
>> than the open-coded version. I admit though that the "easier to
>> understand" part may not be the case for someone that is not familiar
>> with CEILING(), as its meaning is far from obvious from its name. Maybe
>> we could come up with a more descriptive name?
> Wow, I don't like the name of that macro at all. I would expect it to
> do the well known mathematical function, ceil(). It doesn't.
It does, well, sort of... I'm not sure who came up with that macro, but
it seems that what the author had in mind was indeed taking the ceiling
of the fraction given by p/q for CEILING(p, q).
> In most places in the driver, we do:
> ALIGN(value, 32) / 32;
> which would be my preference. I definitely dislike the old open coded
> (val + 31) / 32 business.
That only works for powers of two, and it still forces you to duplicate
the denominator -- in this specific case it's okay but it starts to get
annoying when the denominator is longer than two keystrokes. How about
we rename CEILING() to DIV_ROUND_UP() as the kernel calls it?
> Beyond the style...Matt put MAX2 around the whole expression, and you
> put it around the inside. Those might differ - is one more correct than
> the other?
It can only make a difference for type_sz(x) == 0 or for type_sz(x)
larger than the unit regs_written is expressed in. So right now they
are effectively equivalent but they wouldn't be if at some point we
wanted to keep track of sub-register (e.g. scalar) writes, if we ever do
that we'll have to take the max of width * stride only.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 212 bytes
Desc: not available
More information about the mesa-dev