[PATCH 03/12] drm/amd/display: fix dmub access race condition

Mon Aug 18 11:31:08 UTC 2025

Hi,

On 2025-08-05 13:51:41 +0200, Klara Modin wrote:
> Hi,
> 
> On 2025-07-30 14:58:54 -0400, Roman.Li at amd.com wrote:
> > From: Aurabindo Pillai <aurabindo.pillai at amd.com>
> > 
> > Accessing DC from amdgpu_dm is usually preceded by acquisition of
> > dc_lock mutex. Most of the DC API that DM calls are under a DC lock.
> > However, there are a few that are not. Some DC API called from interrupt
> > context end up sending DMUB commands via a DC API, while other threads were
> > using DMUB. This was apparent from a race between calls for setting idle
> > optimization enable/disable and the DC API to set vmin/vmax.
> > 
> > Offload the call to dc_stream_adjust_vmin_vmax() to a thread instead
> > of directly calling them from the interrupt handler such that it waits
> > for dc_lock.
> > 
> > Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
> > Signed-off-by: Aurabindo Pillai <aurabindo.pillai at amd.com>
> > Signed-off-by: Roman Li <roman.li at amd.com>
> 
> With this patch I get sleeping function called from invalid context
> roughly every second. This occurs on at least PREEMPT_LAZY and
> PREEMPT_VOLUNTARY.
> 
...
> ...
> 
> > +static void schedule_dc_vmin_vmax(struct amdgpu_device *adev,
> > +	struct dc_stream_state *stream,
> > +	struct dc_crtc_timing_adjust *adjust)
> > +{
> > +	struct vupdate_offload_work *offload_work = kzalloc(sizeof(*offload_work), GFP_KERNEL);
> > +	if (!offload_work) {
> > +		drm_dbg_driver(adev_to_drm(adev), "Failed to allocate vupdate_offload_work\n");
> > +		return;
> > +	}
> > +
> > +	struct dc_crtc_timing_adjust *adjust_copy = kzalloc(sizeof(*adjust_copy), GFP_KERNEL);
> > +	if (!adjust_copy) {
> > +		drm_dbg_driver(adev_to_drm(adev), "Failed to allocate adjust_copy\n");
> > +		kfree(offload_work);
> > +		return;
> > +	}
> > +
> > +	dc_stream_retain(stream);
> > +	memcpy(adjust_copy, adjust, sizeof(*adjust_copy));
> > +
> > +	INIT_WORK(&offload_work->work, dm_handle_vmin_vmax_update);
> > +	offload_work->adev = adev;
> > +	offload_work->stream = stream;
> > +	offload_work->adjust = adjust_copy;
> > +
> > +	queue_work(system_wq, &offload_work->work);
> > +}
> > +
> 
> The allocations in this function seems to be the culprit. If I change
> them to GFP_ATOMIC the issue disappears:
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 31ea57edeb45..afe0fea13bb0 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -550,13 +550,13 @@ static void schedule_dc_vmin_vmax(struct amdgpu_device *adev,
>  	struct dc_stream_state *stream,
>  	struct dc_crtc_timing_adjust *adjust)
>  {
> -	struct vupdate_offload_work *offload_work = kzalloc(sizeof(*offload_work), GFP_KERNEL);
> +	struct vupdate_offload_work *offload_work = kzalloc(sizeof(*offload_work), GFP_ATOMIC);
>  	if (!offload_work) {
>  		drm_dbg_driver(adev_to_drm(adev), "Failed to allocate vupdate_offload_work\n");
>  		return;
>  	}
>  
> -	struct dc_crtc_timing_adjust *adjust_copy = kzalloc(sizeof(*adjust_copy), GFP_KERNEL);
> +	struct dc_crtc_timing_adjust *adjust_copy = kzalloc(sizeof(*adjust_copy), GFP_ATOMIC);
>  	if (!adjust_copy) {
>  		drm_dbg_driver(adev_to_drm(adev), "Failed to allocate adjust_copy\n");
>  		kfree(offload_work);
> 

Any thoughts? This is still an issue in the current next. I have also
tried using GFP_NOWAIT instead and not seen any obvious issues under
memory pressure.

Regards,
Klara Modin