[PATCH 2/2] drm/ttm: Fix COW check
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Jul 12 09:27:07 UTC 2021
Am 10.07.21 um 02:29 schrieb Felix Kuehling:
>
> On 2021-07-09 3:37 p.m., Christian König wrote:
>> Am 09.07.21 um 21:31 schrieb Felix Kuehling:
>>> On 2021-07-09 2:38 a.m., Christian König wrote:
>>>>
>>>>
>>>> Am 08.07.21 um 21:36 schrieb Alex Deucher:
>>>>> From: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>>
>>>>> KFD Thunk maps invisible VRAM BOs with PROT_NONE, MAP_PRIVATE.
>>>>> is_cow_mapping returns true for these mappings. Add a check for
>>>>> vm_flags & VM_WRITE to avoid mmap failures on private read-only or
>>>>> PROT_NONE mappings.
>>>>
>>>> I'm pretty sure that this is not working as expected.
>>>
>>> Not sure what you mean. Debugger access to the memory through the
>>> PROT_NONE VMAs is definitely working, with both ptrace and
>>> /proc/<pid>/mem.
>>>
>>>
>>>>
>>>>>
>>>>> Fixes: f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3")
>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> index f56be5bc0861..a75e90c7d4aa 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> @@ -552,7 +552,7 @@ static const struct vm_operations_struct
>>>>> ttm_bo_vm_ops = {
>>>>> int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct
>>>>> ttm_buffer_object *bo)
>>>>> {
>>>>> /* Enforce no COW since would have really strange behavior
>>>>> with it. */
>>>>> - if (is_cow_mapping(vma->vm_flags))
>>>>> + if (is_cow_mapping(vma->vm_flags) && (vma->vm_flags & VM_WRITE))
>>>>
>>>> is_cow_mapping() already checks for VM_MAYWRITE, so this here
>>>> shouldn't be necessary.
>>>
>>> AFAICT, VM_MAYWRITE is not based on the PROT_... bits used to create
>>> the VMA, but based on the permissions of the file. So as long as the
>>> render node is writable, VM_MAYWRITE is set for all VMAs that map it.
>>>
>>> I would agree that it's probably a bad idea for the Thunk to map
>>> these VMAs with MAP_PRIVATE. We can try changing the Thunk to use
>>> MAP_SHARED for these PROT_NONE mappings. But that doesn't change the
>>> fact that this kernel patch broke existing usermode.
>
> For the record, changing the Thunk to use MAP_SHARED works.
>
>
>>
>> Yeah, but see the discussion around MAP_PRIVATE and COW regarding
>> this. What Thunk did here was a really bad idea.
>
> Hindsight ... Which part was a bad idea?
>
> * Creating a PROT_NONE VMA? That's necessary to let ptrace or
> /proc/<pid>/mem access the memory. It's a brilliant idea, IMHO
> * Making that VMA MAP_PRIVATE? Probably a bad idea in hindsight. The
> process itself can't access this memory, let alone write to it. So I
> didn't think too much about whether to make it shared or private.
> Either way, we are not causing any actual COW on these mappings
> because they are not writable, and we never make them writable with
> mprotect either
> * Introducing a COW check after letting it slide for 15 years? It's a
> reasonable check. In this case it catches a false positive. Had this
> check been in place 4 or 5 years ago, it would have easily prevented
> this mess
A combination of everything.
MAP_PRIVATE would have previously caused very odd behavior and/or a
BUG_ON()/WARN_ON() in the VMA code (depending on platform).
MAP_PRIVATE + PROT_NONE kind of worked because it never faulted a page
so we never actually triggered a COW.
>
>>
>> I think we have only two options here:
>> 1. Accept the breakage of thunk.
>
> Really?
Only as last resort, I'm running out of ideas how to fix this cleanly.
>
>
>> 2. Add a workaround in amdgpu/amdkfd to force MAP_PRIVATE into
>> MAP_SHARED in the kernel.
>
> I tried to do this in amdgpu_gem_object_mmap and spent 4 hours
> debugging why it doesn't work. It breaks because the
> mapping->i_mmap_writable gets unbalanced and causes subsequent
> mappings to fail when that atomic counter becomes negative (indicating
> DENYWRITE). I could fix it by calling mapping_map_writable. But I
> don't think this is intended as driver API. I see no precedent of any
> driver calling this. For the record this works, but it feels fragile
> and ugly:
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -255,6 +255,20 @@ static int amdgpu_gem_object_mmap(struct
> drm_gem_object *obj, struct vm_area_str
> if (bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS)
> return -EPERM;
>
> + /* Workaround for Thunk bug creating PROT_NONE,MAP_PRIVATE mappings
> + * for debugger access to invisible VRAM. Should have used
> MAP_SHARED
> + * instead.
> + */
> + if (is_cow_mapping(vma->vm_flags) &&
> + !(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))) {
> + int ret;
> +
> + ret = mapping_map_writable(vma->vm_file->f_mapping);
> + if (ret)
> + return ret;
> + vma->vm_flags |= VM_SHARED | VM_MAYSHARE;
> + }
> +
Yeah, I'm on another thread where we are discussing
mapping_map_writable() for vma_set_file() as well.
> return drm_gem_ttm_mmap(obj, vma);
> }
>
> 3. Improve my previous workaround by adding a similar check for COW in
> ttm_bo_vm_ops.mprotect. I'll send an updated patch.
That has it's own issues, but going to reply there in detail.
Regards,
Christian.
>
> Regards,
> Felix
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> Felix
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>> return -EINVAL;
>>>>> ttm_bo_get(bo);
>>>>
>>
More information about the amd-gfx
mailing list