[PATCH v2 1/8] drm: Add dummy page per device or GEM object

Fri Nov 13 20:52:46 UTC 2020

On 6/22/20 1:50 PM, Daniel Vetter wrote:
> On Mon, Jun 22, 2020 at 7:45 PM Christian König
> <christian.koenig at amd.com> wrote:
>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>> device is removed.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>    4 files changed, 22 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>> index c4c704e..67c0770 100644
>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>> drm_minor *minor)
>>>>>                goto out_prime_destroy;
>>>>>        }
>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +    if (!file->dummy_page) {
>>>>> +        ret = -ENOMEM;
>>>>> +        goto out_prime_destroy;
>>>>> +    }
>>>>> +
>>>>>        return file;
>>>>>      out_prime_destroy:
>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>        if (dev->driver->postclose)
>>>>>            dev->driver->postclose(dev, file);
>>>>>    +    __free_page(file->dummy_page);
>>>>> +
>>>>>        drm_prime_destroy_file_private(&file->prime);
>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>> index 1de2cde..c482e9c 100644
>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>> drm_device *dev,
>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>                dma_buf, *handle);
>>>>> +
>>>>> +    if (!ret) {
>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +        if (!obj->dummy_page)
>>>>> +            ret = -ENOMEM;
>>>>> +    }
>>>>> +
>>>> While the per file case still looks acceptable this is a clear NAK
>>>> since it will massively increase the memory needed for a prime
>>>> exported object.
>>>>
>>>> I think that this is quite overkill in the first place and for the
>>>> hot unplug case we can just use the global dummy page as well.
>>>>
>>>> Christian.
>>>
>>> Global dummy page is good for read access, what do you do on write
>>> access ? My first approach was indeed to map at first global dummy
>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>> that this would trigger Copy On Write flow in core mm
>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&reserved=0)
>>> on the next page fault to same address triggered by a write access but
>>> then i realized a new COW page will be allocated for each such mapping
>>> and this is much more wasteful then having a dedicated page per GEM
>>> object.
>> Yeah, but this is only for a very very small corner cases. What we need
>> to prevent is increasing the memory usage during normal operation to much.
>>
>> Using memory during the unplug is completely unproblematic because we
>> just released quite a bunch of it by releasing all those system memory
>> buffers.
>>
>> And I'm pretty sure that COWed pages are correctly accounted towards the
>> used memory of a process.
>>
>> So I think if that approach works as intended and the COW pages are
>> released again on unmapping it would be the perfect solution to the problem.
>>
>> Daniel what do you think?
> If COW works, sure sounds reasonable. And if we can make sure we
> managed to drop all the system allocations (otherwise suddenly 2x
> memory usage, worst case). But I have no idea whether we can
> retroshoehorn that into an established vma, you might have fun stuff
> like a mkwrite handler there (which I thought is the COW handler
> thing, but really no idea).
>
> If we need to massively change stuff then I think rw dummy page,
> allocated on first fault after hotunplug (maybe just make it one per
> object, that's simplest) seems like the much safer option. Much less
> code that can go wrong.
> -Daniel

Regarding COW, i was looking into how to properly implement it from within the 
fault handler (i.e. ttm_bo_vm_fault)
and the main obstacle I hit is that of exclusive access to the vm_area_struct, i 
need to be able to modify
vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can be 
triggered on subsequent write access
fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
but core mm takes only read side mm_sem (here for example 
https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
and so I am not supposed to modify vm_area_struct in this case. I am not sure if 
it's legit to write lock tthe mm_sem from this point.
I found some discussions about this here 
http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really 
clear to me
what's the solution.

In any case, seems to me that easier and more memory saving solution would be to 
just switch to per ttm bo dumy rw page that
would be allocated on demand as you suggested here.  This should also take care 
of imported BOs and flink cases.
Then i can drop the per device FD and per GEM object FD dummy BO and the ugly 
loop i am using in patch 2 to match faulting BO to the right dummy page.

Does this makes sense ?

Andrey

>
>> Regards,
>> Christian.
>>
>>> We can indeed optimize by allocating this dummy page on the first page
>>> fault after device disconnect instead on GEM object creation.
>>>
>>> Andrey
>>>
>>>
>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>        if (ret)
>>>>>            goto fail;
>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>        dma_buf = attach->dmabuf;
>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>> +
>>>>> +    __free_page(obj->dummy_page);
>>>>> +
>>>>>        /* remove the reference */
>>>>>        dma_buf_put(dma_buf);
>>>>>    }
>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>> index 19df802..349a658 100644
>>>>> --- a/include/drm/drm_file.h
>>>>> +++ b/include/drm/drm_file.h
>>>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>>>         */
>>>>>        struct drm_prime_file_private prime;
>>>>>    +    struct page *dummy_page;
>>>>> +
>>>>>        /* private: */
>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>> index 0b37506..47460d1 100644
>>>>> --- a/include/drm/drm_gem.h
>>>>> +++ b/include/drm/drm_gem.h
>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>>>         *
>>>>>         */
>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>> +
>>>>> +    struct page *dummy_page;
>>>>>    };
>>>>>      /**
>