[PATCH 4/4] drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_access_memory

Thu Feb 6 02:39:48 UTC 2020

On 2020-02-05 10:22 a.m., Christian König wrote:
> Make use of the better performance here as well.
>
> This patch is only compile tested!
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 38 +++++++++++++++----------
>   1 file changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 58d143b24ba0..538c3b52b712 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1565,7 +1565,7 @@ static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo,
>   
>   	while (len && pos < adev->gmc.mc_vram_size) {
>   		uint64_t aligned_pos = pos & ~(uint64_t)3;
> -		uint32_t bytes = 4 - (pos & 3);
> +		uint64_t bytes = 4 - (pos & 3);
>   		uint32_t shift = (pos & 3) * 8;
>   		uint32_t mask = 0xffffffff << shift;
>   
> @@ -1574,20 +1574,28 @@ static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo,
>   			bytes = len;
>   		}
>   
> -		spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> -		WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x80000000);
> -		WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
> -		if (!write || mask != 0xffffffff)
> -			value = RREG32_NO_KIQ(mmMM_DATA);
> -		if (write) {
> -			value &= ~mask;
> -			value |= (*(uint32_t *)buf << shift) & mask;
> -			WREG32_NO_KIQ(mmMM_DATA, value);
> -		}
> -		spin_unlock_irqrestore(&adev->mmio_idx_lock, flags);
> -		if (!write) {
> -			value = (value & mask) >> shift;
> -			memcpy(buf, &value, bytes);
> +		if (mask != 0xffffffff) {
> +			spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> +			WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x80000000);
> +			WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
> +			if (!write || mask != 0xffffffff)
> +				value = RREG32_NO_KIQ(mmMM_DATA);
> +			if (write) {
> +				value &= ~mask;
> +				value |= (*(uint32_t *)buf << shift) & mask;
> +				WREG32_NO_KIQ(mmMM_DATA, value);
> +			}
> +			spin_unlock_irqrestore(&adev->mmio_idx_lock, flags);
> +			if (!write) {
> +				value = (value & mask) >> shift;
> +				memcpy(buf, &value, bytes);
> +			}
> +		} else {
> +			bytes = (nodes->start + nodes->size) << PAGE_SHIFT;
> +			bytes = min(pos - bytes, (uint64_t)len & ~0x3ull);

I think this is incorrect. The following should be true: pos < 
((nodes->start + nodes->size) << PAGE_SHIFT). Consequently pos - bytes 
is always negative here. But because you're doing unsigned math it will 
underflow to a big positive number, which is never the minimum. 
Therefore the min will always be len & ~0x3ull.

I believe this should be min(bytes - pos, (uint64_t)len & ~0x3ull).

Jon, to catch this bug, you'd need a test that first fragments VRAM 
(allocates lots of 2MB buffers and frees every other buffer), then 
allocates a large non-contiguous buffer. Then you need one 4KB or 
smaller access that crosses a boundary between 2MB VRAM buffer chunks.

Christian, your optimized VRAM allocator that tries to get large 
contiguous chunks is nice for performance, but it probably has a 
tendency to hide this kind of bug. I wonder if we should have a debug 
mode that forces non-contiguous buffers to be actually non-contiguous.

Regards,
   Felix

> +
> +			amdgpu_device_vram_access(adev, pos, (uint32_t *)buf,
> +						  bytes, write);
>   		}
>   
>   		ret += bytes;