[Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constant pointer before sending BTP commands

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Tue Aug 18 07:25:55 PDT 2015


Hi,

On pe, 2015-08-14 at 10:58 +0200, Daniel Vetter wrote:
> On Thu, Aug 13, 2015 at 03:49:35PM -0700, Ben Widawsky wrote:
> > On Thu, Aug 13, 2015 at 10:33:00AM +0300, Joonas Lahtinen wrote:
> > > Hi,
> > > 
> > > On ke, 2015-08-12 at 18:35 -0700, Ben Widawsky wrote:
> > > > On Wed, Aug 12, 2015 at 03:10:18PM +0300, Joonas Lahtinen 
> > > > wrote:
> > > > > On ke, 2015-08-12 at 12:26 +0100, Arun Siluvery wrote:
> > > > > > From Gen9, by default push constant command is not 
> > > > > > committed to 
> > > > > > the 
> > > > > > shader unit
> > > > > > untill the corresponding shader's BTP_* command is parsed. 
> > > > > > This 
> > > > > > is 
> > > > > > the
> > > > > > behaviour when set shader is enabled. This patch updates 
> > > > > > the 
> > > > > > batch to 
> > > > > > follow
> > > > > > this requirement otherwise it results in gpu hang.
> > > > > > 
> > > > > > Bugzilla: 
> > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=89959
> > > > > > 
> > > > > > Set shader need to be disabled if legacy behaviour is 
> > > > > > required.
> > > > > > 
> > > > > > Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> > > > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > > > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > > > > > Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com
> > > > > > >
> > > > > 
> > > > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com
> > > > > >
> > > > > 
> > > > 
> > > > Repeating what I said on the mesa thread:
> > > > Does anyone understand why this actually causes a hang on the 
> > > > IGT 
> > > > test? I
> > > > certainly don't. The docs are pretty clear that the constant 
> > > > command 
> > > > is not
> > > > committed until the BTP command, but I can't make any sense of 
> > > > how it 
> > > > related to
> > > > a GPU hang.
> > > > 
> > > 
> > > Changing the order makes the hang go away and come back for sure, 
> > > we've
> > > all been experiencing that. System validation said it is a 
> > > programming
> > > restriction, so I'm not sure how relevant it is what goes wrong 
> > > if it's
> > > not followed. And the legacy mode bits were added to support the 
> > > old
> > > behavior of having the order like it has been previously, so I do 
> > > not
> > > see why question it without visibility to the actual RTL. And 
> > > enabling
> > > the legacy bits makes the hang go away, too.
> > > 
> > > If I had the RTL sources, then it would be more relevant to take
> > > educated guesses as to why a set of hundreds of thousands of
> > > transistors doesn't work as it should :) Without that, if it gets
> > > stuck, it gets stuck.
> > > 
> > > Regards, Joonas
> > > 
> > 
> > Let me start by saying I do believe that questioning this shouldn't 
> > prevent
> > merging the patch.
> > 
> > <rant>
> > I absolutely disagree with you and think we should question these 
> > kind of things
> > and get out of the mindset of, "well, it fixes a hang, let's move 
> > on."
> > Understanding these kind of things is critical to writing stable 
> > drivers.  If
> > the programming guide/SV team said it can lead to a hang, that's 
> > one thing, but
> > AFAICT, we do not understand why it is hanging nor does any of the 
> > documentation
> > we do have suggest it should hang. Without clarification, next time 
> > we have a
> > similar hang signature we're going to be right back here where we 
> > started.
> > 
> > It was one thing when there were a handful of us working on the 
> > stuff and we
> > didn't have time to get to the bottom of bugs like this. I'm guilty 
> > of patches
> > like this myself. I really do not see any excuse for this any more 
> > though.
> > </rant>
> > 
> > Could you send me the reference for where SV said it was a 
> > "programming
> > restriction"? To me it all sounds very much like an implementation 
> > detail, and
> > I'd like to try to understand what I am missing.
> 
> Fully agree, we can't just ignore hangs but have to bottom out on 
> them.
> And in the past we've had piles of examples where we discovered 
> something
> and didn't chase it correctly, then a few months later massive panic 
> in
> intel. At least this must be reflected in bspec.
> 
> Joonas, can you please work together with Ben to chase this down? If 
> you
> don't get traction please escalate the shit out of this,  we really 
> must
> at least get docs to accurately reflect reality.
> 

The latest reply from SV team to Arun's inquiry is that when the order
is not followed, the previous push constant values get used (which will
be all zeroes, so doesn't really make sense) and would lead to the hang
or corruption. I'm checking them if the same applies for the BTP itself
too, where the PS pointer keeps changing (and it's the only changing
thing), and could explain it better.

Regards, Joonas

> Thanks, Daniel


More information about the Intel-gfx mailing list