[Freedreno] [PATCH] drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer

Abhinav Kumar quic_abhinavk at quicinc.com
Thu Dec 15 00:37:44 UTC 2022


Hi Doug

On 12/14/2022 4:14 PM, Doug Anderson wrote:
> Hi,
> 
> On Wed, Dec 14, 2022 at 3:46 PM Abhinav Kumar <quic_abhinavk at quicinc.com> wrote:
>>
>> Hi Doug
>>
>> On 12/14/2022 2:29 PM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Wed, Dec 14, 2022 at 1:21 PM Kuogee Hsieh <quic_khsieh at quicinc.com> wrote:
>>>>
>>>> There are 3 possible interrupt sources are handled by DP controller,
>>>> HPDstatus, Controller state changes and Aux read/write transaction.
>>>> At every irq, DP controller have to check isr status of every interrupt
>>>> sources and service the interrupt if its isr status bits shows interrupts
>>>> are pending. There is potential race condition may happen at current aux
>>>> isr handler implementation since it is always complete dp_aux_cmd_fifo_tx()
>>>> even irq is not for aux read or write transaction. This may cause aux read
>>>> transaction return premature if host aux data read is in the middle of
>>>> waiting for sink to complete transferring data to host while irq happen.
>>>> This will cause host's receiving buffer contains unexpected data. This
>>>> patch fixes this problem by checking aux isr and return immediately at
>>>> aux isr handler if there are no any isr status bits set.
>>>>
>>>> Follows are the signature at kernel logs when problem happen,
>>>> EDID has corrupt header
>>>> panel-simple-dp-aux aux-aea0000.edp: Couldn't identify panel via EDID
>>>> panel-simple-dp-aux aux-aea0000.edp: error -EIO: Couldn't detect panel nor find a fallback
>>>>
>>>> Signed-off-by: Kuogee Hsieh <quic_khsieh at quicinc.com>
>>>> ---
>>>>    drivers/gpu/drm/msm/dp/dp_aux.c | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
>>>> index d030a93..8f8b12a 100644
>>>> --- a/drivers/gpu/drm/msm/dp/dp_aux.c
>>>> +++ b/drivers/gpu/drm/msm/dp/dp_aux.c
>>>> @@ -423,6 +423,13 @@ void dp_aux_isr(struct drm_dp_aux *dp_aux)
>>>>
>>>>           isr = dp_catalog_aux_get_irq(aux->catalog);
>>>>
>>>> +       /*
>>>> +        * if this irq is not for aux transfer,
>>>> +        * then return immediately
>>>> +        */
>>>
>>> Why do you need 4 lines for a comment that fits on one line?
>> Yes, we can fit this to one line.
>>>
>>>> +       if (!isr)
>>>> +               return;
>>>
>>> I can confirm that this works for me. I could reproduce the EDID
>>> problems in the past and I can't after this patch. ...so I could give
>>> a:
>>>
>>> Tested-by: Douglas Anderson <dianders at chromium.org>
>>>
>>> I'm not an expert on this part of the code, so feel free to ignore my
>>> other comments if everyone else thinks this patch is fine as-is, but
>>> to me something here feels a little fragile. It feels a little weird
>>> that we'll "complete" for _any_ interrupt that comes through now
>>> rather than relying on dp_aux_native_handler() / dp_aux_i2c_handler()
>>> to specifically identify interrupts that caused the end of the
>>> transfer. I guess that idea is that every possible interrupt we get
>>> causes the end of the transfer?
>>>
>>> -Doug
>>
>> So this turned out to be more tricky and was a good finding from kuogee.
>>
>> In the bad EDID case, it was technically not bad EDID.
>>
>> What was happening was, the VIDEO_READY interrupt was continuously
>> firing. Ideally, this should fire only once but due to some error
>> condition it kept firing. We dont exactly know why yet what was the
>> error condition making it continuously fire.
>>
>> In the DP ISR, the dp_aux_isr() gets called even if it was not an aux
>> interrupt which fired (so the call flow in this case was
>> dp_display_irq_handler (triggered for VIDEO_READY) ---> dp_aux_isr()
>> So we should certainly have some protection to return early from this
>> routine if there was no aux interrupt which fired.
>>
>> Which is what this fix is doing.
>>
>> Its not completing any interrupt, its just returning early if no aux
>> interrupt fired.
> 
> ...but the whole problem was that it was doing the complete() at the
> end, right? Kuogee even mentioned that in the commit message.
> Specifically, I checked dp_aux_native_handler() and
> dp_aux_i2c_handler(), both of which are passed the "isr". Unless I
> messed up, both functions already were no-ops if the ISR was 0, even
> before Kuogee's patch. That means that the only thing Kuogee's patch
> does is to prevent the call to "complete(&aux->comp)" at the end of
> "dp_aux_isr()".
> 
> ...and it makes sense not to call the complete() if no "isr" is 0.
> ...but what I'm saying is that _any_ non-zero value of ISR will still
> cause the complete() to be called after Kuogee's patch. That means
> that if any of the 32-bits in the "isr" variable are set, that we will
> call complete(). I'm asking if you're sure that every single bit of
> the "isr" means that we're ready to call complete(). It feels like it
> would be less fragile if dp_aux_native_handler() and
> dp_aux_i2c_handler() (which both already look at the ISR) returned
> some value saying whether the "isr" contained a bit that meant that
> complete() should be called.
> 

Yes, so other than the "transfer done" bits, the other bits we listen to 
are below:

29 #define DP_INTERRUPT_STATUS1 \
30 	(DP_INTR_AUX_I2C_DONE| \
31 	DP_INTR_WRONG_ADDR | DP_INTR_TIMEOUT | \
32 	DP_INTR_NACK_DEFER | DP_INTR_WRONG_DATA_CNT | \
33 	DP_INTR_I2C_NACK | DP_INTR_I2C_DEFER | \
34 	DP_INTR_PLL_UNLOCKED | DP_INTR_AUX_ERROR

All of these, if they fire, will be handled in dp_aux_i2c_handler() and 
the aux_error_num will be assigned.

And only if aux_error_num is DP_AUX_ERR_NONE, we go further and read the 
data from the fifo.

So we should complete even if there is any bit set as they are error 
bits which will need to be handled.

> -Doug


More information about the dri-devel mailing list