[Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constant pointer before sending BTP commands

Ben Widawsky benjamin.widawsky at intel.com
Tue Aug 18 18:44:24 PDT 2015


On Tue, Aug 18, 2015 at 05:25:55PM +0300, Joonas Lahtinen wrote:
> Hi,
> 
> On pe, 2015-08-14 at 10:58 +0200, Daniel Vetter wrote:
> > On Thu, Aug 13, 2015 at 03:49:35PM -0700, Ben Widawsky wrote:
> > > On Thu, Aug 13, 2015 at 10:33:00AM +0300, Joonas Lahtinen wrote:
> > > > Hi,
> > > > 
> > > > On ke, 2015-08-12 at 18:35 -0700, Ben Widawsky wrote:
> > > > > On Wed, Aug 12, 2015 at 03:10:18PM +0300, Joonas Lahtinen 
> > > > > wrote:
> > > > > > On ke, 2015-08-12 at 12:26 +0100, Arun Siluvery wrote:
> > > > > > > From Gen9, by default push constant command is not 
> > > > > > > committed to 
> > > > > > > the 
> > > > > > > shader unit
> > > > > > > untill the corresponding shader's BTP_* command is parsed. 
> > > > > > > This 
> > > > > > > is 
> > > > > > > the
> > > > > > > behaviour when set shader is enabled. This patch updates 
> > > > > > > the 
> > > > > > > batch to 
> > > > > > > follow
> > > > > > > this requirement otherwise it results in gpu hang.
> > > > > > > 
> > > > > > > Bugzilla: 
> > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=89959
> > > > > > > 
> > > > > > > Set shader need to be disabled if legacy behaviour is 
> > > > > > > required.
> > > > > > > 
> > > > > > > Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> > > > > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > > > > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > > > > > > Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com
> > > > > > > >
> > > > > > 
> > > > > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com
> > > > > > >
> > > > > > 
> > > > > 
> > > > > Repeating what I said on the mesa thread:
> > > > > Does anyone understand why this actually causes a hang on the 
> > > > > IGT 
> > > > > test? I
> > > > > certainly don't. The docs are pretty clear that the constant 
> > > > > command 
> > > > > is not
> > > > > committed until the BTP command, but I can't make any sense of 
> > > > > how it 
> > > > > related to
> > > > > a GPU hang.
> > > > > 
> > > > 
> > > > Changing the order makes the hang go away and come back for sure, 
> > > > we've
> > > > all been experiencing that. System validation said it is a 
> > > > programming
> > > > restriction, so I'm not sure how relevant it is what goes wrong 
> > > > if it's
> > > > not followed. And the legacy mode bits were added to support the 
> > > > old
> > > > behavior of having the order like it has been previously, so I do 
> > > > not
> > > > see why question it without visibility to the actual RTL. And 
> > > > enabling
> > > > the legacy bits makes the hang go away, too.
> > > > 
> > > > If I had the RTL sources, then it would be more relevant to take
> > > > educated guesses as to why a set of hundreds of thousands of
> > > > transistors doesn't work as it should :) Without that, if it gets
> > > > stuck, it gets stuck.
> > > > 
> > > > Regards, Joonas
> > > > 
> > > 
> > > Let me start by saying I do believe that questioning this shouldn't 
> > > prevent
> > > merging the patch.
> > > 
> > > <rant>
> > > I absolutely disagree with you and think we should question these 
> > > kind of things
> > > and get out of the mindset of, "well, it fixes a hang, let's move 
> > > on."
> > > Understanding these kind of things is critical to writing stable 
> > > drivers.  If
> > > the programming guide/SV team said it can lead to a hang, that's 
> > > one thing, but
> > > AFAICT, we do not understand why it is hanging nor does any of the 
> > > documentation
> > > we do have suggest it should hang. Without clarification, next time 
> > > we have a
> > > similar hang signature we're going to be right back here where we 
> > > started.
> > > 
> > > It was one thing when there were a handful of us working on the 
> > > stuff and we
> > > didn't have time to get to the bottom of bugs like this. I'm guilty 
> > > of patches
> > > like this myself. I really do not see any excuse for this any more 
> > > though.
> > > </rant>
> > > 
> > > Could you send me the reference for where SV said it was a 
> > > "programming
> > > restriction"? To me it all sounds very much like an implementation 
> > > detail, and
> > > I'd like to try to understand what I am missing.
> > 
> > Fully agree, we can't just ignore hangs but have to bottom out on 
> > them.
> > And in the past we've had piles of examples where we discovered 
> > something
> > and didn't chase it correctly, then a few months later massive panic 
> > in
> > intel. At least this must be reflected in bspec.
> > 
> > Joonas, can you please work together with Ben to chase this down? If 
> > you
> > don't get traction please escalate the shit out of this,  we really 
> > must
> > at least get docs to accurately reflect reality.
> > 
> 
> The latest reply from SV team to Arun's inquiry is that when the order
> is not followed, the previous push constant values get used (which will
> be all zeroes, so doesn't really make sense) and would lead to the hang
> or corruption. I'm checking them if the same applies for the BTP itself
> too, where the PS pointer keeps changing (and it's the only changing
> thing), and could explain it better.
> 
> Regards, Joonas

Cool. It seems like an interesting conversation. I managed to convince myself
corruption can happen, maybe - but not a hang. Now that you've kicked off that
conversation, I'm fine with merging the patch (we've done our "due diligence"
/me shudders).

Reviewed-by: Ben Widawsky <ben at bwidawsk.net>

> 
> > Thanks, Daniel

-- 
Ben Widawsky, Intel Open Source Technology Center


More information about the Intel-gfx mailing list