[Mesa-dev] [PATCH 09/10] i965: Add and use is_scheduling_barrier() function.
Francisco Jerez
currojerez at riseup.net
Wed Mar 9 22:11:48 UTC 2016
Ian Romanick <idr at freedesktop.org> writes:
> On 03/08/2016 04:47 PM, Francisco Jerez wrote:
>> Matt Turner <mattst88 at gmail.com> writes:
>>
>>> On Fri, Mar 4, 2016 at 8:49 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>> Matt Turner <mattst88 at gmail.com> writes:
>>>>
>>>>> Though there is a lot of overlap with has_side_effects(), these do mean
>>>>> different things.
>>>>
>>>> Can we do it the other way around and implement is_scheduling_barrier()
>>>> in terms of has_side_effects()? has_side_effects() seems like the more
>>>> fundamental of the two and because is_scheduling_barrier() is specific
>>>> to the scheduler it would make more sense to keep it static inline in
>>>> brw_schedule_instructions.cpp for the sake of encapsulation.
>>>>
>>>> AFAIUI is_scheduling_barrier() is merely a makeshift approximation at
>>>> missing memory dependency analysis, and in the long term is the wrong
>>>> question to ask (IMHO the right question is "does this instruction have
>>>> an execution dependency with respect to this other?", which implies that
>>>> either of the two instructions has some sort of side effect, but not the
>>>> converse). has_side_effects() OTOH has a well-defined answer that can
>>>> be answered by looking at the semantics of the instruction alone,
>>>> independent from scheduling heuristics and surrounding compiler
>>>> infrastructure.
>>>>
>>>> I think for the moment I'd make is_scheduling_barrier return true if the
>>>> instruction has side effects, except where you have the guarantee that
>>>> the side-effectful instruction won't have a (non-dataflow related)
>>>> execution dependency with any other instruction of the program, which is
>>>> currently only the case for non-EOT FB_WRITE -- Pretty much has Ken had
>>>> open-coded it in his scheduling changes except for the non-EOT part.
>>>
>>> I think your suggestion is to keep has_side_effects() the same and to
>>> wrap it with a static is_scheduling_barrier(). What we have is
>>>
>>> bool
>>> backend_instruction::has_side_effects() const
>>> {
>>> switch (opcode) {
>>> [list of instructions]
>>> return true;
>>> default:
>>> return false;
>>> }
>>> }
>>>
>>> bool
>>> fs_inst::has_side_effects() const
>>> {
>>> return this->eot || backend_instruction::has_side_effects();
>>> }
>>>
>>> And then in the scheduler,
>>>
>>> if ((inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
>>> inst->has_side_effects()) &&
>>> inst->opcode != FS_OPCODE_FB_WRITE)
>>> add_barrier_deps(n);
>>>
>>> where the FS_OPCODE_FB_WRITE check was added recently to avoid
>>> treating it as a barrier. That seems pretty dirty, because
>>> has_side_effects() returns true for that opcode, so we're just hacking
>>> around that. I noted in my revert that it also had the effect of
>>> making an FB_WRITE with EOT not a barrier.
>>>
>>> So, your suggestion is to add another layer on top of that, that
>>> checks inst->eot?
>>>
>>> We'd have something like
>>>
>>> static inline is_scheduling_barrier(fs_inst *inst)
>>> {
>>> return inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
>>> (inst->has_side_effects() &&
>>> inst->opcode != FS_OPCODE_FB_WRITE) ||
>>> inst->eot;
>>> }
>>>
>>> where, tracing through the execution, fs_inst::has_side_effects would
>>> return true because inst->eot, but since opcode is FB_WRITE that
>>> expression would evaluate to false. But then because inst->eot, it'd
>>> evaluate to true.
>>>
>>> Doesn't that seem *really* hacky?
>>>
>>> Can I go ahead and get someone else's opinion? I doubt you're going to agree.
>>
>> Oh, I definitely agree, it seems a bit of a hack to me -- It seems hacky
>> because the whole scheduling barrier business IS a hack. Instead of
>> implementing actual memory dependency analysis we want to do the
>> following (at least for the time being) which I think we all agree is a
>> hack:
>>
>> Treat any instruction with side-effects as a scheduling barrier except
>> where we have the guarantee that the side-effects of the instruction
>> won't have any influence on any other instruction in the program.
>>
>> That's currently only the case for non-EOT FB_WRITE (and there's nothing
>> special about FB_WRITE that makes it different from other side-effectful
>> instructions other than the fact that we currently don't implement
>> framebuffer reads, so it's pretty much coincidental that they currently
>> have no influence on anything else).
>
> FB reads will be necessary for GL_KHR_blend_equation_advanced which is
> needed for OpenGL ES 3.2.
>
Yeah, exactly, that's part of the reason I regard this as a temporary
hack -- We will most likely end up reverting the special handling of
FB_WRITEs (and treat it as a regular side-effectful instruction during
scheduling) when framebuffer reads are implemented because the
assumption this is based on will break down.
> Implementing full memory dependency tracking is hard.
It's a pretty much straightforward extension of the use/def chains
analysis pass I've been working on. I don't think it will be that
difficult once that code has landed.
> It seems like we can do a little bit better than we're doing, and we
> will need to do better for GL_KHR_blend_equation_advanced. Most
> shaders won't read the framebuffer, and those that do should only have
> FB reads and writes conflict.
>
> I think having a thing like Matt's is_scheduling_barrier is probably the
> right idea in this sort of world, but I think it's to narrow. I may be
> wrong, but I don't think GLSL requires that memory accessible through
> multiple methods is coherent. That is, a write to an SSBO could be
> reordered after a read from a UBO. If those happen to the be same
> backing storage, that's tough. This is like the strict aliasing rules in C.
>
Yeah, that's right, implementations don't have to provide any coherency
guarantees between memory types (e.g. UBOs vs SSBOs), an FB write can be
assumed to only potentially interfere with FB reads.
> There's really three things that we care about. There are things that
> "significantly" affect program flow (i.e., HALT and EOT),
HALT should probably be dealt with as a regular control flow instruction
-- It doesn't really terminate the program, all it does is jump to
another HALT instruction, only the final EOT message terminates the
thread. I have patches to treat HALT as control flow and remove the
special handling but they caused some shader-db regressions that were
the original motivation of the use/def chains stuff.
> things that are explicitly barriers (e.g., atomics, explicit memory
> barriers),
Atomics don't have barrier semantics, they just need to be atomic.
> and things that write the kind of memory object you are reading. It
> seems like we ought to be able to capture that in a small set of
> functions that make it obvious what's happening... and give us some
> room to improve the infrastructure.
>
I think what we'll want in the long run (assuming we want to make room
for memory dependency analysis which is probably not strictly necessary
for Matt to take care of right now) is a way to iterate over dependent
memory defs and uses of a given instruction, which is exactly what
use/def chains will give you once it's extended to handle memory reads
and writes.
> Maybe the compromise for now is to:
>
> - Leave has_side_effects as is, but maybe change it's name to
> something_memory_something.
>
> - Add a new method that returns true for FS_OPCODE_PLACEHOLDER_HALT or
> eot. No current opinion on what to call it... maybe
> something_thread_end_something.
>
> - Add is_scheduling_barrier that returns true if
> something_memory_something or something_thread_end_something.
>
I have doubts that would help Matt a ton right now ;), it sounds roughly
equivalent to the situation before d0e1d6b7e27bf5f05436e47080d326d7daa6.
> Or maybe I have no idea what I'm talking about and should be ignored. :)
> Always possible...
>
>> Your is_scheduling_barrier() proposal is a fairly clear translation of
>> my sentence explaining what we want to do into code. It seems like a
>> hack in code because it is a hack in principle. AFAICT building
>> is_scheduling_barrier() into the core IR data structures only obscures
>> the matter and involves other compiler infrastructure into this hack
>> while they don't need to know or care.
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160309/09533404/attachment.sig>
More information about the mesa-dev
mailing list