[PATCH 2/2] drm/ttm: Fix COW check

Christian König ckoenig.leichtzumerken at gmail.com
Mon Jul 12 09:27:07 UTC 2021


Am 10.07.21 um 02:29 schrieb Felix Kuehling:
>
> On 2021-07-09 3:37 p.m., Christian König wrote:
>> Am 09.07.21 um 21:31 schrieb Felix Kuehling:
>>> On 2021-07-09 2:38 a.m., Christian König wrote:
>>>>
>>>>
>>>> Am 08.07.21 um 21:36 schrieb Alex Deucher:
>>>>> From: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>>
>>>>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE.
>>>>> is_cow_mapping returns true for these mappings. Add a check for
>>>>> vm_flags & VM_WRITE to avoid mmap failures on private read-only or
>>>>> PROT_NONE mappings.
>>>>
>>>> I'm pretty sure that this is not working as expected.
>>>
>>> Not sure what you mean. Debugger access to the memory through the 
>>> PROT_NONE VMAs is definitely working, with both ptrace and 
>>> /proc/<pid>/mem.
>>>
>>>
>>>>
>>>>>
>>>>> Fixes: f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3")
>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> index f56be5bc0861..a75e90c7d4aa 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> @@ -552,7 +552,7 @@ static const struct vm_operations_struct 
>>>>> ttm_bo_vm_ops = {
>>>>>   int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct 
>>>>> ttm_buffer_object *bo)
>>>>>   {
>>>>>       /* Enforce no COW since would have really strange behavior 
>>>>> with it. */
>>>>> -    if (is_cow_mapping(vma->vm_flags))
>>>>> +    if (is_cow_mapping(vma->vm_flags) && (vma->vm_flags & VM_WRITE))
>>>>
>>>> is_cow_mapping() already checks for VM_MAYWRITE, so this here 
>>>> shouldn't be necessary.
>>>
>>> AFAICT, VM_MAYWRITE is not based on the PROT_... bits used to create 
>>> the VMA, but based on the permissions of the file. So as long as the 
>>> render node is writable, VM_MAYWRITE is set for all VMAs that map it.
>>>
>>> I would agree that it's probably a bad idea for the Thunk to map 
>>> these VMAs with MAP_PRIVATE. We can try changing the Thunk to use 
>>> MAP_SHARED for these PROT_NONE mappings. But that doesn't change the 
>>> fact that this kernel patch broke existing usermode.
>
> For the record, changing the Thunk to use MAP_SHARED works.
>
>
>>
>> Yeah, but see the discussion around MAP_PRIVATE and COW regarding 
>> this. What Thunk did here was a really bad idea.
>
> Hindsight ... Which part was a bad idea?
>
>  * Creating a PROT_NONE VMA? That's necessary to let ptrace or
>    /proc/<pid>/mem access the memory. It's a brilliant idea, IMHO
>  * Making that VMA MAP_PRIVATE? Probably a bad idea in hindsight. The
>    process itself can't access this memory, let alone write to it. So I
>    didn't think too much about whether to make it shared or private.
>    Either way, we are not causing any actual COW on these mappings
>    because they are not writable, and we never make them writable with
>    mprotect either
>  * Introducing a COW check after letting it slide for 15 years? It's a
>    reasonable check. In this case it catches a false positive. Had this
>    check been in place 4 or 5 years ago, it would have easily prevented
>    this mess

A combination of everything.

MAP_PRIVATE would have previously caused very odd behavior and/or a 
BUG_ON()/WARN_ON() in the VMA code (depending on platform).

MAP_PRIVATE + PROT_NONE kind of worked because it never faulted a page 
so we never actually triggered a COW.

>
>>
>> I think we have only two options here:
>> 1. Accept the breakage of thunk.
>
> Really?

Only as last resort, I'm running out of ideas how to fix this cleanly.

>
>
>> 2. Add a workaround in amdgpu/amdkfd to force MAP_PRIVATE into 
>> MAP_SHARED in the kernel.
>
> I tried to do this in amdgpu_gem_object_mmap and spent 4 hours 
> debugging why it doesn't work. It breaks because the 
> mapping->i_mmap_writable gets unbalanced and causes subsequent 
> mappings to fail when that atomic counter becomes negative (indicating 
> DENYWRITE). I could fix it by calling mapping_map_writable. But I 
> don't think this is intended as driver API. I see no precedent of any 
> driver calling this. For the record this works, but it feels fragile 
> and ugly:
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -255,6 +255,20 @@ static int amdgpu_gem_object_mmap(struct 
> drm_gem_object *obj, struct vm_area_str
>      if (bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS)
>          return -EPERM;
>
> +    /* Workaround for Thunk bug creating PROT_NONE,MAP_PRIVATE mappings
> +     * for debugger access to invisible VRAM. Should have used 
> MAP_SHARED
> +     * instead.
> +     */
> +    if (is_cow_mapping(vma->vm_flags) &&
> +        !(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))) {
> +        int ret;
> +
> +        ret = mapping_map_writable(vma->vm_file->f_mapping);
> +        if (ret)
> +            return ret;
> +        vma->vm_flags |= VM_SHARED | VM_MAYSHARE;
> +    }
> +

Yeah, I'm on another thread where we are discussing 
mapping_map_writable() for vma_set_file() as well.

> return drm_gem_ttm_mmap(obj, vma);
>  }
>
> 3. Improve my previous workaround by adding a similar check for COW in 
> ttm_bo_vm_ops.mprotect. I'll send an updated patch.

That has it's own issues, but going to reply there in detail.

Regards,
Christian.

>
> Regards,
>   Felix
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>           return -EINVAL;
>>>>>         ttm_bo_get(bo);
>>>>
>>



More information about the amd-gfx mailing list