[PATCH V2] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

Christian König ckoenig.leichtzumerken at gmail.com
Wed Feb 28 11:34:17 UTC 2024


Hi Jesse,

Am 28.02.24 um 09:43 schrieb jesse.zhang at amd.com:
> From: "Jesse.Zhang" <Jesse.Zhang at amd.com>
>
> fix the issue:
> "amdgpu: Failed to create process VM object".
>
> [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table.
> But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm
> and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
> So they have a conflict, and clinfo  always failed.
>
> [HOW]
> Skip the seq64 entry check in vm page table.
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> index a160265ddc07..bdae5381887e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
> @@ -746,8 +746,21 @@ bool amdgpu_vm_pt_is_root_clean(struct amdgpu_device *adev,
>   	enum amdgpu_vm_level root = adev->vm_manager.root_level;
>   	unsigned int entries = amdgpu_vm_pt_num_entries(adev, root);
>   	unsigned int i = 0;
> +	u64 seq64_addr = (adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT) - AMDGPU_VA_RESERVED_TOP;
> +
> +	seq64_addr /= AMDGPU_GPU_PAGE_SIZE;
> +	mask = amdgpu_vm_pt_entries_mask(adev, adev->vm_manager.root_level);
> +	shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level);
> +	seq64_entry = (seq64_addr >> shift) & mask;
>   
>   	for (i = 0; i < entries; i++) {
> +		/* seq64  reserve 2M memory from top of address space.
> +		 * Then do the mapping and update the vm page table at amdgpu initialize.
> +		 * So skip the know result.
> +		 */
> +
> +		if(i == seq64_entry)
> +			continue;

Once more it is intentional that this fails!

Renoir shouldn't be using the ATS setting any more because that 
functionality was removed.

But it looks like the setting is somehow still active and because of 
this you run into this issue here.

Regards,
Christian.

>   		if (to_amdgpu_bo_vm(vm->root.bo)->entries[i].bo)
>   			return false;
>   	}



More information about the amd-gfx mailing list