[PATCH 2/2] drm/amdgpu: Wait for pte updates before uq_resume

Sundararaju, Sathishkumar sasundar at amd.com
Sun Mar 16 15:43:14 UTC 2025


Please ignore this, will resend an updated version.

Regards,
Sathish

On 3/16/2025 9:06 PM, Sathishkumar S wrote:
> Wait for vm page table updates to finish before resuming user queues.
> Resume must follow after completion of pte updates to avoid page faults.
>
> amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:10 pasid:32771)
> amdgpu:  in process  pid 0 thread  pid 0)
> amdgpu:   in page starting at address 0x0000800105405000 from client 10
> amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00A41051
> amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
> amdgpu:      MORE_FAULTS: 0x1
> amdgpu:      WALKER_ERROR: 0x0
> amdgpu:      PERMISSION_FAULTS: 0x5
> amdgpu:      MAPPING_ERROR: 0x0
> amdgpu:      RW: 0x1
> amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:10 pasid:32771)
> amdgpu:  in process  pid 0 thread  pid 0)
> amdgpu:   in page starting at address 0x0000800105404000 from client 10
>
> Signed-off-by: Sathishkumar S <sathishkumar.sundararaju at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index f1d4e29772a5..4c3edd988a05 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -541,10 +541,23 @@ amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
>   static void amdgpu_userqueue_resume_worker(struct work_struct *work)
>   {
>   	struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, resume_work.work);
> +	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
> +	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
> +	struct amdgpu_eviction_fence *ev_fence;
>   	int ret;
>   
>   	mutex_lock(&uq_mgr->userq_mutex);
>   
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	ev_fence = evf_mgr->ev_fence;
> +	spin_unlock(&evf_mgr->ev_fence_lock);
> +	if (ev_fence && dma_fence_is_signaled(&ev_fence->base)) {
> +	/* Wait for the prior vm updates to complete before proceeding with resume */
> +		dma_resv_wait_timeout(fpriv->vm.root.bo->tbo.base.resv,
> +				      DMA_RESV_USAGE_BOOKKEEP,
> +				      true,
> +				      msecs_to_jiffies(AMDGPU_FENCE_JIFFIES_TIMEOUT));
> +	}
>   	ret = amdgpu_userqueue_validate_bos(uq_mgr);
>   	if (ret) {
>   		DRM_ERROR("Failed to validate BOs to restore\n");



More information about the amd-gfx mailing list