[Intel-gfx] [PATCH 3/4] drm/i915: Fix random aux transactions failures.

Vivi, Rodrigo rodrigo.vivi at intel.com
Wed Oct 21 12:57:57 PDT 2015


On Wed, 2015-10-21 at 23:47 +0530, Thulasimani, Sivakumar wrote:
> 
> On 10/21/2015 12:53 PM, Daniel Vetter wrote:
> > On Wed, Oct 21, 2015 at 09:18:06AM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 21, 2015 at 10:28:53AM -0700, Rodrigo Vivi wrote:
> > > > Mainly aux communications on sink_crc
> > > > were failing a lot randomly on recent platforms.
> > > > The first solution was to try to use intel_dp_dpcd_read_wake, 
> > > > but then
> > > > it was suggested to move retries to drm level.
> > > > 
> > > > Since drm level was already taking care of retries and didn't 
> > > > want
> > > > to through random retries on that level the second solution was 
> > > > to
> > > > put the retries at aux_transfer layer what was nacked.
> > > > 
> > > > So I realized we had so many retries in different places and
> > > > started to organize that a bit. During this organization I 
> > > > noticed
> > > > that we weren't handing at all the case were the message size 
> > > > was
> > > > zeroed. And this was exactly the case that was affecting 
> > > > sink_crc.
> > > > 
> > > > Also we weren't respect BSPec who says this size message = 0 or 
> > > > > 20
> > > > are forbidden.
> > > > 
> > > > It is a fact that we still have no clue why we are getting this
> > > > forbidden value there. But anyway we need to handle that for 
> > > > now
> > > > so we return -EBUSY and drm level takes care of the retries 
> > > > that
> > > > are already in place.
> > > > 
> > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > ---
> > > >   drivers/gpu/drm/i915/intel_dp.c | 11 +++++++++++
> > > >   1 file changed, 11 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c 
> > > > b/drivers/gpu/drm/i915/intel_dp.c
> > > > index aa3d8f6..80850d6 100644
> > > > --- a/drivers/gpu/drm/i915/intel_dp.c
> > > > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > > > @@ -911,6 +911,17 @@ done:
> > > >   	/* Unload any bytes sent back from the other side */
> > > >   	recv_bytes = ((status & 
> > > > DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
> > > >   		      DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
> > > > +
> > > > +	/*
> > > > +	 * By BSpec: "Message sizes of 0 or >20 are not 
> > > > allowed."
> > > > +	 * We have no idea of what happened so we return 
> > > > -EBUSY so
> > > > +	 * drm layer takes care for the necessary retries.
> > > > +	 */
> > > > +	if (recv_bytes == 0 || recv_bytes > 20) {
> > > > +		ret = -EBUSY;
> > > > +		goto out;
> > > > +	}
> > > Hm, this should be caught be the dp aux helper library. Both 
> > > callers for
> > > ->transfer should check for this and reject with -EINVAL (since 
> > > such a
> > > transaction is simply not allowed by dp aux). In the case of
> > > drm_dp_i2c_do_msg maybe even with a WARN_ON since the i2c logic 
> > > should
> > > split things up correctly.
> > Meh, totally misread what's going on here, this is from the 
> > hardware. How
> > does this even happen? Do you get some kind of garbage value? 
> > Should we
> > maybe clear this register field first? It certainly would explain a 
> > lot of
> > our random dp aux retry fun ...
> > -Daniel
> we are already checking for read size in intel_dp_aux_transfer
> 
>   if (WARN_ON(rxsize > 20))
>                 return -E2BIG;
> 
> and again inside intel_dp_aux_ch
> 
>   843         /* Only 5 data registers! */
>   844         if (WARN_ON(send_bytes > 20 || recv_size > 20)) {
>   845                 ret = -E2BIG;
>   846                 goto out;
>   847         }
> 
> Also isn't it possible that a simple write command will have 0 for 
> receive size ?

no, this is not possible. I'm sure I'm doing the proper read/writes on
sink_crc code.

> can you share bit more details on what scenario this patch is helping 
> ?

automated psr and frontbuffer cases on SKL. sink_crc is failing without
this patch or without retrying reads few times.

> 
> regards,
> Sivakumar
> 


More information about the Intel-gfx mailing list