<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jun 15, 2017 at 4:15 AM, Chris Wilson <span dir="ltr"><<a href="mailto:chris@chris-wilson.co.uk" target="_blank">chris@chris-wilson.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Quoting Kenneth Graunke (2017-06-14 21:44:45)<br>
<div><div class="h5">> On Tuesday, June 13, 2017 2:53:20 PM PDT Jason Ekstrand wrote:<br>
> > As I've been working on converting more things in the GL driver over to<br>
> > blorp, I've been highly annoyed by all of the hangs on Haswell. About one<br>
> > in 3-5 Jenkins runs would hang somewhere. After looking at about a<br>
> > half-dozen error states, I noticed that all of the hangs seemed to be on<br>
> > fast-clear operations (clear or resolve) that happen at the start of a<br>
> > batch, right after STATE_BASE_ADDRESS.<br>
> ><br>
> > Haswell seems to be a bit more picky than other hardware about having<br>
> > fast-clear operations in flight at the same time as regular rendering and<br>
> > hangs if the two ever overlap. (Other hardware can get rendering<br>
> > corruption but not usually hangs.) Also, Haswell doesn't fully stall if<br>
> > you just do a RT flush and a CS stall. The hardware docs refer to<br>
> > something they call an "end of pipe sync" which is a CS stall with a write<br>
> > to the workaround BO. On Haswell, you also need to read from that same<br>
> > address to create a memory dependency and make sure the system is fully<br>
> > stalled.<br>
> ><br>
> > When you call brw_blorp_resolve_color it calls brw_emit_pipe_control_flush<br>
> > and does the correct flushes and then calls into core blorp to do the<br>
> > actual resolve operation. If the batch doesn't have enough space left in<br>
> > it for the fast-clear operation, the batch will get split and the<br>
> > fast-clear will happen in the next batch. I believe what is happening is<br>
> > that while we're building the second batch that actually contains the<br>
> > fast-clear, some other process completes a batch and inserts it between our<br>
> > PIPE_CONTROL to do the stall and the actual fast-clear. We then end up<br>
> > with more stuff in flight than we can handle and the GPU explodes.<br>
> ><br>
> > I'm not 100% convinced of this explanation because it seems a bit fishy<br>
> > that a context switch wouldn't be enough to fully flush out the GPU.<br>
> > However, what I do know is that, without these patches I get a hang in one<br>
> > out of three to five Jenkins runs on my wip/i965-blorp-ds branch. With the<br>
> > patches (or an older variant that did the same thing), I have done almost 20<br>
> > Jenkins runs and have yet to see a hang. I'd call that success.<br>
> ><br>
> > Jason Ekstrand (6):<br>
> > i965: Flush around state base address<br>
> > i965: Take a uint64_t immediate in emit_pipe_control_write<br>
> > i965: Unify the two emit_pipe_control functions<br>
> > i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS<br>
> > i965/blorp: Do an end-of-pipe sync around CCS ops<br>
> > i965: Do an end-of-pipe sync after flushes<br>
> ><br>
> > Topi Pohjolainen (1):<br>
> > i965: Add an end-of-pipe sync helper<br>
> ><br>
> > src/mesa/drivers/dri/i965/brw_<wbr>blorp.c | 16 +-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>context.h | 3 +-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>misc_state.c | 38 +++++<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>pipe_control.c | 243 ++++++++++++++++++---------<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>queryobj.c | 5 +-<br>
> > src/mesa/drivers/dri/i965/<wbr>gen6_queryobj.c | 2 +-<br>
> > src/mesa/drivers/dri/i965/<wbr>genX_blorp_exec.c | 2 +-<br>
> > 7 files changed, 211 insertions(+), 98 deletions(-)<br>
> ><br>
> ><br>
><br>
> The series is:<br>
> Reviewed-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
><br>
> If Chris is right, and what we're really seeing is that MI_SET_CONTEXT<br>
> needs additional flushing, it probably makes sense to fix the kernel.<br>
> If it's really fast clear related, then we should do it in Mesa.<br>
<br>
</div></div>If I'm right, it's more of a userspace problem because you have to<br>
insert a pipeline stall before STATE_BASE_ADDRESS when switching between<br>
blorp/normal and back again, in the same batch. That the MI_SET_CONTEXT<br>
may be restoring the dirty GPU state from the previous batch just means that<br>
you have to think of batches as being one long continuous batch.<br>
-Chris<br></blockquote><div><br></div><div> Given that, I doubt your explanation is correct. Right now, we should be correct under the "long continuous batch" assumption and we're hanging. So I think that either MI_SET_CONTEXT doesn't stall hard enough or we're conflicting with another process somehow.<br></div></div></div></div>