[PATCH v2] drm/msm/dpu: make "vblank timeout" more useful

Abhinav Kumar quic_abhinavk at quicinc.com
Tue Feb 20 22:40:30 UTC 2024



On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
> On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk at quicinc.com> wrote:
>>
>>
>>
>> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
>>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk at quicinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
>>>>> We have several reports of vblank timeout messages. However after some
>>>>> debugging it was found that there might be different causes to that.
>>>>> To allow us to identify the DPU block that gets stuck, include the
>>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
>>>>> snapshot capture.
>>>>>
>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov at linaro.org>
>>>>> ---
>>>>> Changes in v2:
>>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>>>>>      (Abhinav)
>>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
>>>>> ---
>>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>>>>>                 (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>>>>>                 msecs_to_jiffies(50));
>>>>>         if (ret <= 0) {
>>>>> -             DPU_ERROR("vblank timeout\n");
>>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
>>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
>>>>
>>>>
>>>> There is no rate limiting in this piece of code unfortunately. So this
>>>> will flood the number of snapshots.
>>>
>>> Well... Yes and no. The devcoredump will destroy other snapshots if
>>> there is a pending one. So only the console will be flooded and only
>>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
>>>
>>
>> Yes, true but at the same time this makes it hard to capture a good dump
>> as potentially every vblank you could timeout so this destroy/create
>> cycle wont end.
> 
> Excuse me, maybe I miss something. On the first timeout the snapshot
> is created. It is held by the kernel until it is fully read out from
> the userspace. Other snapshots will not interfere with this snapshot.
> 

For every new snapshot a new devcoredump device will be created which 
should remain till it has been read. But now this will be created every 
blank. IMO, this is really too much data for no reason.

Subsequent vblank timeouts are not going to give any new information 
compared to the existing snapshot of the first vblank timeout thats why 
we should just create the snapshot when the first error happens and stop.

For other frame done timeouts, infact subsequent timeouts without any 
sort of recovery in between are quite misleading because hardware was 
already not able to fetch the previous frame so it will most likely not 
fetch the next one either till it has recovered. Typically thats why 
these vblank timeouts happen in a flurry as the hardware never really 
recovered from the first timeout.

> Or are you worried that snapshotting takes time, so taking a snapshot
> will also interfere with the vblank timings for the next vblank?
> 

Yes this is another point.


More information about the dri-devel mailing list