[PATCH v2] drm/amdkfd: Fix EXT_COHERENT memory allocation crash

Felix Kuehling felix.kuehling at amd.com
Tue Oct 3 21:37:12 UTC 2023


On 2023-10-03 16:50, Philip Yang wrote:
> If there is no VRAM domain, bo_node is NULL and this causes crash.
> Refactor the change, and use the module parameter as higher privilege.
>
> Need another patch to support override PTE flag on APU.
>
> Fixes: 55d7e2001c7e ("drm/amdgpu: Add EXT_COHERENT memory allocation flags")
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 18 +++++++-----------
>   1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 0d88698ae33f..305b2c54edfa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1248,26 +1248,22 @@ svm_range_get_pte_flags(struct kfd_node *node,
>   		break;
>   	case IP_VERSION(9, 4, 3):
>   		mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC :
> -			     (amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW);
> +			      (amdgpu_mtype_local == 2 || ext_coherent ?
> +					AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW);

We had some offline discussion where I thought that MTYPE_NC should 
become MTYPE_UC when ext_coherent is enabled to get the desired memory 
semantics. With that idea in mind, this would become a bit more messy, 
but here it goes, as clean as I can make it:

-		mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC :
-			     (amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW);
+		mtype_local = amdgpu_mtype_local == 1 && !ext_coherent ? AMDGPU_VM_MTYPE_NC :
+			     (amdgpu_mtype_local == 1 &&  ext_coherent ? AMDGPU_VM_MTYPE_UC :
+			     (amdgpu_mtype_local == 2 ||  ext_coherent ? AMDGPU_VM_MTYPE_CC :
+									 AMDGPU_VM_MTYPE_RW));

But maybe that could be fixed up in a follow up patch. Either way, for 
the purpose of fixing the crash, this patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>   		snoop = true;
>   		if (uncached) {
>   			mapping_flags |= AMDGPU_VM_MTYPE_UC;
> -		} else if (ext_coherent) {
> -			/* local HBM region close to partition */
> -			if (bo_node->adev == node->adev &&
> -			    (!bo_node->xcp || !node->xcp || bo_node->xcp->mem_id == node->xcp->mem_id))
> -				mapping_flags |= AMDGPU_VM_MTYPE_CC;
> -			else
> -				mapping_flags |= AMDGPU_VM_MTYPE_UC;
>   		} else if (domain == SVM_RANGE_VRAM_DOMAIN) {
>   			/* local HBM region close to partition */
>   			if (bo_node->adev == node->adev &&
>   			    (!bo_node->xcp || !node->xcp || bo_node->xcp->mem_id == node->xcp->mem_id))
>   				mapping_flags |= mtype_local;
> -			/* local HBM region far from partition or remote XGMI GPU */
> -			else if (svm_nodes_in_same_hive(bo_node, node))
> +			/* local HBM region far from partition or remote XGMI GPU
> +			 * with regular system scope coherence
> +			 */
> +			else if (svm_nodes_in_same_hive(bo_node, node) && !ext_coherent)
>   				mapping_flags |= AMDGPU_VM_MTYPE_NC;
> -			/* PCIe P2P */
> +			/* PCIe P2P or extended system scope coherence */
>   			else
>   				mapping_flags |= AMDGPU_VM_MTYPE_UC;
>   		/* system memory accessed by the APU */


More information about the amd-gfx mailing list