[PATCH v3 01/12] drm: Add dummy page per device or GEM object

Fri Jan 8 14:33:34 UTC 2021

Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> Hey Christian, just a ping.

Was there any question for me here?

As far as I can see the best approach would still be to fill the VMA 
with a single dummy page and avoid pointers in the GEM object.

Christian.

>
> Andrey
>
> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>
>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>> device is removed.
>>>>>>>>> Uff, one page for each exported DMA-buf? That's not something 
>>>>>>>>> we can do.
>>>>>>>>>
>>>>>>>>> We need to find a different approach here.
>>>>>>>>>
>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>> this corner case, but why the linking ?
>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>
>>>>>>> What we can do is to allocate a page on demand for each fault 
>>>>>>> and link
>>>>>>> the together in the bdev instead.
>>>>>>>
>>>>>>> And when the bdev is then finally destroyed after the last 
>>>>>>> application
>>>>>>> closed we can finally release all of them.
>>>>>>>
>>>>>>> Christian.
>>>>>> Hey, started to implement this and then realized that by 
>>>>>> allocating a page
>>>>>> for each fault indiscriminately
>>>>>> we will be allocating a new page for each faulting virtual 
>>>>>> address within a
>>>>>> VA range belonging the same BO
>>>>>> and this is obviously too much and not the intention. Should I 
>>>>>> instead use
>>>>>> let's say a hashtable with the hash
>>>>>> key being faulting BO address to actually keep allocating and 
>>>>>> reusing same
>>>>>> dummy zero page per GEM BO
>>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>>> Why do we need a hashtable? All the sw structures to track this 
>>>>> should
>>>>> still be around:
>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a 
>>>>> dma-buf,
>>>>>     so defensively allocate a per-bo page
>>>>> - otherwise allocate a per-file page
>>>>
>>>> That exactly what we have in current implementation
>>>>
>>>>
>>>>> Or is the idea to save the struct page * pointer? That feels a bit 
>>>>> like
>>>>> over-optimizing stuff. Better to have a simple implementation 
>>>>> first and
>>>>> then tune it if (and only if) any part of it becomes a problem for 
>>>>> normal
>>>>> usage.
>>>>
>>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>>> Christian suggested to instead keep a linked list of dummy pages to be
>>>> allocated on demand once we hit a vm_fault. I will then also 
>>>> prefault the entire
>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>>> to that single dummy page.
>>> This strongly feels like premature optimization. If you're worried 
>>> about
>>> the overhead on amdgpu, pay down the debt by removing one of the 
>>> redundant
>>> pointers between gem and ttm bo structs (I think we still have some) 
>>> :-)
>>>
>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>> pointer just because" games with hashtables.
>>> -Daniel
>>
>>
>> Well, if you and Christian can agree on this approach and suggest 
>> maybe what pointer is
>> redundant and can be removed from GEM struct so we can use the 
>> 'credit' to add the dummy page
>> to GEM I will be happy to follow through.
>>
>> P.S Hash table is off the table anyway and we are talking only about 
>> linked list here since by prefaulting
>> the entire VA range for a vmf->vma i will be avoiding redundant page 
>> faults to same VMA VA range and so
>> don't need to search and reuse an existing dummy page but simply 
>> create a new one for each next fault.
>>
>> Andrey