[PATCH] drm/i915/display: Change ret value type from int to long

Aakash Deep Sarkar aakash.deep.sarkar at intel.com
Fri Jul 4 13:05:15 UTC 2025


> And as far as I can see i915 is using the config option DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very large value this should work as expected.

Sorry I forgot to mention this in the commit message; this issue is happening in the xe driver. DRM_I915_FENCE_TIMEOUT config is used only in the case of i915 driver. For xe it's hardcoded to MAX_SCHEDULE_TIMEOUT in the compat-i915-headers layer.

Regards,
Aakash

-----Original Message-----
From: Christian König <christian.koenig at amd.com> 
Sent: 04 July 2025 17:57
To: Jani Nikula <jani.nikula at linux.intel.com>; Aakash Deep Sarkar <aakash.deep.sarkar at intel.com>; intel-gfx at lists.freedesktop.org
Cc: Badrappan, Jeevaka <jeevaka.badrappan at intel.com>; Ville Syrjälä <ville.syrjala at linux.intel.com>; Maarten Lankhorst <maarten.lankhorst at linux.intel.com>; Hogander, Jouni <jouni.hogander at intel.com>; Alex Deucher <alexander.deucher at amd.com>; Lucas Stach <l.stach at pengutronix.de>; Rob Clark <robin.clark at oss.qualcomm.com>; Thierry Reding <thierry.reding at gmail.com>; Julia Lawall <julia.lawall at inria.fr>; Dan Carpenter <dan.carpenter at linaro.org>
Subject: Re: [PATCH] drm/i915/display: Change ret value type from int to long

On 04.07.25 14:00, Jani Nikula wrote:
> On Fri, 04 Jul 2025, Aakash Deep Sarkar <aakash.deep.sarkar at intel.com> wrote:
>> dma_fence_wait_timeout returns a long type but the driver is only 
>> using the lower 32 bits of the retval and discarding the upper 32 
>> bits.
>>
>> This is particularly problematic if there are already signalled or 
>> stub fences on some of the hw planes. In this case the 
>> dma_fence_wait_timeout function will immediately return with timeout 
>> value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since the fence is 
>> already signalled.

That is not correct. If the fence is signaled dma_fence_wait_timeout() returns the remaining timeout or at least 1 if the input timeout was 0.

Could be that i915 has a separately implemented fence_ops->wait callback which incorrectly returns MAX_SCHEDULE_TIMEOUT, but i strongly doubt that because that would break tons of stuff.

> If the driver only uses the lower
>> 32 bits of this return value then it'll interpret it as an error code 
>> (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences.
>>
>> This issue was first observed with the Android compositor where the 
>> GPU composited layer was not properly waited on when there were stub 
>> fences in other overlay planes resulting in significant visual 
>> artifacts.
> 
> Thanks for the patch, good catch!

If I'm not completely mistaken this patch is not necessary.

dma_fence_wait_timeout() does *not* return MAX_SCHEDULE_TIMEOUT when the fence is signaled, but rather the remaining timeout which is the input value at maximum.

So when the input timeout fits into 32bits the returned timeout also fits into 32bits as well.

And as far as I can see i915 is using the config option DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very large value this should work as expected.

>> Test: No graphical artifacts with shadertoy
> 
> We've never used this commit trailer before, please let's not start now.
> 
> The subject should plainly state the "what", and the commit message 
> should answer the "why". You do have that here, but I think the 
> subject is still a bit too much nuts and bolts.
> 
> For example,
> 
> 	drm/i915/display: Fix dma_fence_wait_timeout() return value handling
> 
> would state the "what" in *much* more helpful terms for anyone looking 
> at git logs.
> 
> Presumably this has been an error for some time. You should do a 
> little bit of git blame on the offending lines. It'll find commit 
> d59cf7bb73f3
> ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence").
> 
> Based on that, we should add:
> 
> Fixes: d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces 
> instead of i915_sw_fence")
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Cc: Jouni Högander <jouni.hogander at intel.com>
> Cc: <stable at vger.kernel.org> # v6.8+
> 
> Then it occurs to me this looks like a common mistake to make. A 
> little bit of git grep on dma_fence_wait_timeout() quickly finds 
> multiple similar mistakes in drm, at least amdgpu, etnaviv, msm, and 
> tegra. Cc some maintainers FYI. This class of bugs could cause issues elsewhere.

I can only guess but I think all those use cases use a fixed small timeout as well. IIRC amdgpu uses a timeout in the millisecond range.

> Let's also Cc Julia and Dan in case they have ideas to improve static 
> analysis to catch this class of bugs. Or maybe one exists already, but 
> we're just not running them!

Yeah, agree. A script which checks if the input timeout could be >32bit and the return value is assigned to something <=32bits is probably a really good idea.

Regards,
Christian.

> 
> Finally, for the actual change,
> 
> Reviewed-by: Jani Nikula <jani.nikula at intel.com>
> 
> 
>> Signed-off-by: Aakash Deep Sarkar <aakash.deep.sarkar at intel.com>
>> ---
>>  drivers/gpu/drm/i915/display/intel_display.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
>> b/drivers/gpu/drm/i915/display/intel_display.c
>> index 456fc4b04cda..7035c1fc9033 100644
>> --- a/drivers/gpu/drm/i915/display/intel_display.c
>> +++ b/drivers/gpu/drm/i915/display/intel_display.c
>> @@ -7092,7 +7092,8 @@ static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_stat
>>  	struct drm_i915_private *i915 = to_i915(intel_state->base.dev);
>>  	struct drm_plane *plane;
>>  	struct drm_plane_state *new_plane_state;
>> -	int ret, i;
>> +	long ret;
>> +	int i;
>>  
>>  	for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, i) {
>>  		if (new_plane_state->fence) {
> 



More information about the Intel-gfx mailing list