[PATCH] accel/ivpu: Implement heartbeat-based TDR mechanism

Jeffrey Hugo quic_jhugo at quicinc.com
Fri Apr 18 15:27:55 UTC 2025


On 4/16/2025 4:25 AM, Maciej Falkowski wrote:
> From: Karol Wachowski <karol.wachowski at intel.com>
> 
> Introduce a heartbeat-based Timeout Detection and Recovery (TDR) mechanism.
> The enhancement aims to improve the reliability of device hang detection by
> monitoring heartbeat updates.
> 
> Each progressing inference will update heartbeat counter allowing driver to
> monitor its progression. Limit maximum number of reschedules when heartbeat
> indicates progression to 30.

Code looks good.  However, why 30?  This would artificially limit how 
long a job could run, no?

-Jeff


More information about the dri-devel mailing list