[Intel-gfx] [PATCH 3/4] drm/i915: Fix random aux transactions failures.

Thulasimani, Sivakumar sivakumar.thulasimani at intel.com
Wed Oct 21 11:17:15 PDT 2015



On 10/21/2015 12:53 PM, Daniel Vetter wrote:
> On Wed, Oct 21, 2015 at 09:18:06AM +0200, Daniel Vetter wrote:
>> On Wed, Oct 21, 2015 at 10:28:53AM -0700, Rodrigo Vivi wrote:
>>> Mainly aux communications on sink_crc
>>> were failing a lot randomly on recent platforms.
>>> The first solution was to try to use intel_dp_dpcd_read_wake, but then
>>> it was suggested to move retries to drm level.
>>>
>>> Since drm level was already taking care of retries and didn't want
>>> to through random retries on that level the second solution was to
>>> put the retries at aux_transfer layer what was nacked.
>>>
>>> So I realized we had so many retries in different places and
>>> started to organize that a bit. During this organization I noticed
>>> that we weren't handing at all the case were the message size was
>>> zeroed. And this was exactly the case that was affecting sink_crc.
>>>
>>> Also we weren't respect BSPec who says this size message = 0 or > 20
>>> are forbidden.
>>>
>>> It is a fact that we still have no clue why we are getting this
>>> forbidden value there. But anyway we need to handle that for now
>>> so we return -EBUSY and drm level takes care of the retries that
>>> are already in place.
>>>
>>> Cc: Jani Nikula <jani.nikula at intel.com>
>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/intel_dp.c | 11 +++++++++++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
>>> index aa3d8f6..80850d6 100644
>>> --- a/drivers/gpu/drm/i915/intel_dp.c
>>> +++ b/drivers/gpu/drm/i915/intel_dp.c
>>> @@ -911,6 +911,17 @@ done:
>>>   	/* Unload any bytes sent back from the other side */
>>>   	recv_bytes = ((status & DP_AUX_CH_CTL_MESSAGE_SIZE_MASK) >>
>>>   		      DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
>>> +
>>> +	/*
>>> +	 * By BSpec: "Message sizes of 0 or >20 are not allowed."
>>> +	 * We have no idea of what happened so we return -EBUSY so
>>> +	 * drm layer takes care for the necessary retries.
>>> +	 */
>>> +	if (recv_bytes == 0 || recv_bytes > 20) {
>>> +		ret = -EBUSY;
>>> +		goto out;
>>> +	}
>> Hm, this should be caught be the dp aux helper library. Both callers for
>> ->transfer should check for this and reject with -EINVAL (since such a
>> transaction is simply not allowed by dp aux). In the case of
>> drm_dp_i2c_do_msg maybe even with a WARN_ON since the i2c logic should
>> split things up correctly.
> Meh, totally misread what's going on here, this is from the hardware. How
> does this even happen? Do you get some kind of garbage value? Should we
> maybe clear this register field first? It certainly would explain a lot of
> our random dp aux retry fun ...
> -Daniel
we are already checking for read size in intel_dp_aux_transfer

  if (WARN_ON(rxsize > 20))
                return -E2BIG;

and again inside intel_dp_aux_ch

  843         /* Only 5 data registers! */
  844         if (WARN_ON(send_bytes > 20 || recv_size > 20)) {
  845                 ret = -E2BIG;
  846                 goto out;
  847         }

Also isn't it possible that a simple write command will have 0 for 
receive size ?
can you share bit more details on what scenario this patch is helping ?

regards,
Sivakumar



More information about the Intel-gfx mailing list