[PATCH] drm/i915/display: Change ret value type from int to long

Christian König christian.koenig at amd.com
Fri Jul 4 14:22:37 UTC 2025


Ah, yes. That makes a bit more sense, in this case the patch is indeed necessary.

But why is XE hard coding MAX_SCHEDULE_TIMEOUT here while i915 isn't?

Regards,
Christian

On 04.07.25 15:05, Aakash Deep Sarkar wrote:
>> And as far as I can see i915 is using the config option DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very large value this should work as expected.
> 
> Sorry I forgot to mention this in the commit message; this issue is happening in the xe driver. DRM_I915_FENCE_TIMEOUT config is used only in the case of i915 driver. For xe it's hardcoded to MAX_SCHEDULE_TIMEOUT in the compat-i915-headers layer.
> 
> Regards,
> Aakash
> 
> -----Original Message-----
> From: Christian König <christian.koenig at amd.com> 
> Sent: 04 July 2025 17:57
> To: Jani Nikula <jani.nikula at linux.intel.com>; Aakash Deep Sarkar <aakash.deep.sarkar at intel.com>; intel-gfx at lists.freedesktop.org
> Cc: Badrappan, Jeevaka <jeevaka.badrappan at intel.com>; Ville Syrjälä <ville.syrjala at linux.intel.com>; Maarten Lankhorst <maarten.lankhorst at linux.intel.com>; Hogander, Jouni <jouni.hogander at intel.com>; Alex Deucher <alexander.deucher at amd.com>; Lucas Stach <l.stach at pengutronix.de>; Rob Clark <robin.clark at oss.qualcomm.com>; Thierry Reding <thierry.reding at gmail.com>; Julia Lawall <julia.lawall at inria.fr>; Dan Carpenter <dan.carpenter at linaro.org>
> Subject: Re: [PATCH] drm/i915/display: Change ret value type from int to long
> 
> On 04.07.25 14:00, Jani Nikula wrote:
>> On Fri, 04 Jul 2025, Aakash Deep Sarkar <aakash.deep.sarkar at intel.com> wrote:
>>> dma_fence_wait_timeout returns a long type but the driver is only 
>>> using the lower 32 bits of the retval and discarding the upper 32 
>>> bits.
>>>
>>> This is particularly problematic if there are already signalled or 
>>> stub fences on some of the hw planes. In this case the 
>>> dma_fence_wait_timeout function will immediately return with timeout 
>>> value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since the fence is 
>>> already signalled.
> 
> That is not correct. If the fence is signaled dma_fence_wait_timeout() returns the remaining timeout or at least 1 if the input timeout was 0.
> 
> Could be that i915 has a separately implemented fence_ops->wait callback which incorrectly returns MAX_SCHEDULE_TIMEOUT, but i strongly doubt that because that would break tons of stuff.
> 
>> If the driver only uses the lower
>>> 32 bits of this return value then it'll interpret it as an error code 
>>> (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences.
>>>
>>> This issue was first observed with the Android compositor where the 
>>> GPU composited layer was not properly waited on when there were stub 
>>> fences in other overlay planes resulting in significant visual 
>>> artifacts.
>>
>> Thanks for the patch, good catch!
> 
> If I'm not completely mistaken this patch is not necessary.
> 
> dma_fence_wait_timeout() does *not* return MAX_SCHEDULE_TIMEOUT when the fence is signaled, but rather the remaining timeout which is the input value at maximum.
> 
> So when the input timeout fits into 32bits the returned timeout also fits into 32bits as well.
> 
> And as far as I can see i915 is using the config option DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very large value this should work as expected.
> 
>>> Test: No graphical artifacts with shadertoy
>>
>> We've never used this commit trailer before, please let's not start now.
>>
>> The subject should plainly state the "what", and the commit message 
>> should answer the "why". You do have that here, but I think the 
>> subject is still a bit too much nuts and bolts.
>>
>> For example,
>>
>> 	drm/i915/display: Fix dma_fence_wait_timeout() return value handling
>>
>> would state the "what" in *much* more helpful terms for anyone looking 
>> at git logs.
>>
>> Presumably this has been an error for some time. You should do a 
>> little bit of git blame on the offending lines. It'll find commit 
>> d59cf7bb73f3
>> ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence").
>>
>> Based on that, we should add:
>>
>> Fixes: d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces 
>> instead of i915_sw_fence")
>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Cc: Jouni Högander <jouni.hogander at intel.com>
>> Cc: <stable at vger.kernel.org> # v6.8+
>>
>> Then it occurs to me this looks like a common mistake to make. A 
>> little bit of git grep on dma_fence_wait_timeout() quickly finds 
>> multiple similar mistakes in drm, at least amdgpu, etnaviv, msm, and 
>> tegra. Cc some maintainers FYI. This class of bugs could cause issues elsewhere.
> 
> I can only guess but I think all those use cases use a fixed small timeout as well. IIRC amdgpu uses a timeout in the millisecond range.
> 
>> Let's also Cc Julia and Dan in case they have ideas to improve static 
>> analysis to catch this class of bugs. Or maybe one exists already, but 
>> we're just not running them!
> 
> Yeah, agree. A script which checks if the input timeout could be >32bit and the return value is assigned to something <=32bits is probably a really good idea.
> 
> Regards,
> Christian.
> 
>>
>> Finally, for the actual change,
>>
>> Reviewed-by: Jani Nikula <jani.nikula at intel.com>
>>
>>
>>> Signed-off-by: Aakash Deep Sarkar <aakash.deep.sarkar at intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_display.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
>>> b/drivers/gpu/drm/i915/display/intel_display.c
>>> index 456fc4b04cda..7035c1fc9033 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_display.c
>>> @@ -7092,7 +7092,8 @@ static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_stat
>>>  	struct drm_i915_private *i915 = to_i915(intel_state->base.dev);
>>>  	struct drm_plane *plane;
>>>  	struct drm_plane_state *new_plane_state;
>>> -	int ret, i;
>>> +	long ret;
>>> +	int i;
>>>  
>>>  	for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, i) {
>>>  		if (new_plane_state->fence) {
>>
> 



More information about the Intel-gfx mailing list