[PATCH 2/2] drm/ttm: Fix COW check

Felix Kuehling felix.kuehling at amd.com
Sat Jul 10 00:29:31 UTC 2021


On 2021-07-09 3:37 p.m., Christian König wrote:
> Am 09.07.21 um 21:31 schrieb Felix Kuehling:
>> On 2021-07-09 2:38 a.m., Christian König wrote:
>>>
>>>
>>> Am 08.07.21 um 21:36 schrieb Alex Deucher:
>>>> From: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>
>>>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE.
>>>> is_cow_mapping returns true for these mappings. Add a check for
>>>> vm_flags & VM_WRITE to avoid mmap failures on private read-only or
>>>> PROT_NONE mappings.
>>>
>>> I'm pretty sure that this is not working as expected.
>>
>> Not sure what you mean. Debugger access to the memory through the 
>> PROT_NONE VMAs is definitely working, with both ptrace and 
>> /proc/<pid>/mem.
>>
>>
>>>
>>>>
>>>> Fixes: f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3")
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>> ---
>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> index f56be5bc0861..a75e90c7d4aa 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> @@ -552,7 +552,7 @@ static const struct vm_operations_struct 
>>>> ttm_bo_vm_ops = {
>>>>   int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct 
>>>> ttm_buffer_object *bo)
>>>>   {
>>>>       /* Enforce no COW since would have really strange behavior 
>>>> with it. */
>>>> -    if (is_cow_mapping(vma->vm_flags))
>>>> +    if (is_cow_mapping(vma->vm_flags) && (vma->vm_flags & VM_WRITE))
>>>
>>> is_cow_mapping() already checks for VM_MAYWRITE, so this here 
>>> shouldn't be necessary.
>>
>> AFAICT, VM_MAYWRITE is not based on the PROT_... bits used to create 
>> the VMA, but based on the permissions of the file. So as long as the 
>> render node is writable, VM_MAYWRITE is set for all VMAs that map it.
>>
>> I would agree that it's probably a bad idea for the Thunk to map 
>> these VMAs with MAP_PRIVATE. We can try changing the Thunk to use 
>> MAP_SHARED for these PROT_NONE mappings. But that doesn't change the 
>> fact that this kernel patch broke existing usermode.

For the record, changing the Thunk to use MAP_SHARED works.


>
> Yeah, but see the discussion around MAP_PRIVATE and COW regarding 
> this. What Thunk did here was a really bad idea.

Hindsight ... Which part was a bad idea?

  * Creating a PROT_NONE VMA? That's necessary to let ptrace or
    /proc/<pid>/mem access the memory. It's a brilliant idea, IMHO
  * Making that VMA MAP_PRIVATE? Probably a bad idea in hindsight. The
    process itself can't access this memory, let alone write to it. So I
    didn't think too much about whether to make it shared or private.
    Either way, we are not causing any actual COW on these mappings
    because they are not writable, and we never make them writable with
    mprotect either
  * Introducing a COW check after letting it slide for 15 years? It's a
    reasonable check. In this case it catches a false positive. Had this
    check been in place 4 or 5 years ago, it would have easily prevented
    this mess


>
> I think we have only two options here:
> 1. Accept the breakage of thunk.

Really?


> 2. Add a workaround in amdgpu/amdkfd to force MAP_PRIVATE into 
> MAP_SHARED in the kernel.

I tried to do this in amdgpu_gem_object_mmap and spent 4 hours debugging 
why it doesn't work. It breaks because the mapping->i_mmap_writable gets 
unbalanced and causes subsequent mappings to fail when that atomic 
counter becomes negative (indicating DENYWRITE). I could fix it by 
calling mapping_map_writable. But I don't think this is intended as 
driver API. I see no precedent of any driver calling this. For the 
record this works, but it feels fragile and ugly:

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -255,6 +255,20 @@ static int amdgpu_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_str
  	if (bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS)
  		return -EPERM;
  
+	/* Workaround for Thunk bug creating PROT_NONE,MAP_PRIVATE mappings
+	 * for debugger access to invisible VRAM. Should have used MAP_SHARED
+	 * instead.
+	 */
+	if (is_cow_mapping(vma->vm_flags) &&
+	    !(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))) {
+		int ret;
+
+		ret = mapping_map_writable(vma->vm_file->f_mapping);
+		if (ret)
+			return ret;
+		vma->vm_flags |= VM_SHARED | VM_MAYSHARE;
+	}
+
  	return drm_gem_ttm_mmap(obj, vma);
  }

3. Improve my previous workaround by adding a similar check for COW in 
ttm_bo_vm_ops.mprotect. I'll send an updated patch.

Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Christian.
>>>
>>>>           return -EINVAL;
>>>>         ttm_bo_get(bo);
>>>
>


More information about the amd-gfx mailing list