[Intel-gfx] [PATCH 3/4] drm/i915: Fix random aux transactions failures.
Vivi, Rodrigo
rodrigo.vivi at intel.com
Wed Oct 21 12:57:57 PDT 2015
On Wed, 2015-10-21 at 23:47 +0530, Thulasimani, Sivakumar wrote:
>
> On 10/21/2015 12:53 PM, Daniel Vetter wrote:
> > On Wed, Oct 21, 2015 at 09:18:06AM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 21, 2015 at 10:28:53AM -0700, Rodrigo Vivi wrote:
> > > > Mainly aux communications on sink_crc
> > > > were failing a lot randomly on recent platforms.
> > > > The first solution was to try to use intel_dp_dpcd_read_wake,
> > > > but then
> > > > it was suggested to move retries to drm level.
> > > >
> > > > Since drm level was already taking care of retries and didn't
> > > > want
> > > > to through random retries on that level the second solution was
> > > > to
> > > > put the retries at aux_transfer layer what was nacked.
> > > >
> > > > So I realized we had so many retries in different places and
> > > > started to organize that a bit. During this organization I
> > > > noticed
> > > > that we weren't handing at all the case were the message size
> > > > was
> > > > zeroed. And this was exactly the case that was affecting
> > > > sink_crc.
> > > >
> > > > Also we weren't respect BSPec who says this size message = 0 or
> > > > > 20
> > > > are forbidden.
> > > >
> > > > It is a fact that we still have no clue why we are getting this
> > > > forbidden value there. But anyway we need to handle that for
> > > > now
> > > > so we return -EBUSY and drm level takes care of the retries
> > > > that
> > > > are already in place.
> > > >
> > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > ---
> > > > drivers/gpu/drm/i915/intel_dp.c | 11 +++++++++++
> > > > 1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c
> > > > b/drivers/gpu/drm/i915/intel_dp.c
> > > > index aa3d8f6..80850d6 100644
> > > > --- a/drivers/gpu/drm/i915/intel_dp.c
> > > > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > > > @@ -911,6 +911,17 @@ done:
> > > > /* Unload any bytes sent back from the other side */
> > > > recv_bytes = ((status &
> > > > DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
> > > > DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
> > > > +
> > > > + /*
> > > > + * By BSpec: "Message sizes of 0 or >20 are not
> > > > allowed."
> > > > + * We have no idea of what happened so we return
> > > > -EBUSY so
> > > > + * drm layer takes care for the necessary retries.
> > > > + */
> > > > + if (recv_bytes == 0 || recv_bytes > 20) {
> > > > + ret = -EBUSY;
> > > > + goto out;
> > > > + }
> > > Hm, this should be caught be the dp aux helper library. Both
> > > callers for
> > > ->transfer should check for this and reject with -EINVAL (since
> > > such a
> > > transaction is simply not allowed by dp aux). In the case of
> > > drm_dp_i2c_do_msg maybe even with a WARN_ON since the i2c logic
> > > should
> > > split things up correctly.
> > Meh, totally misread what's going on here, this is from the
> > hardware. How
> > does this even happen? Do you get some kind of garbage value?
> > Should we
> > maybe clear this register field first? It certainly would explain a
> > lot of
> > our random dp aux retry fun ...
> > -Daniel
> we are already checking for read size in intel_dp_aux_transfer
>
> if (WARN_ON(rxsize > 20))
> return -E2BIG;
>
> and again inside intel_dp_aux_ch
>
> 843 /* Only 5 data registers! */
> 844 if (WARN_ON(send_bytes > 20 || recv_size > 20)) {
> 845 ret = -E2BIG;
> 846 goto out;
> 847 }
>
> Also isn't it possible that a simple write command will have 0 for
> receive size ?
no, this is not possible. I'm sure I'm doing the proper read/writes on
sink_crc code.
> can you share bit more details on what scenario this patch is helping
> ?
automated psr and frontbuffer cases on SKL. sink_crc is failing without
this patch or without retrying reads few times.
>
> regards,
> Sivakumar
>
More information about the Intel-gfx
mailing list