[Linaro-mm-sig] [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap

Mon Mar 1 10:17:35 UTC 2021

Am 01.03.21 um 10:21 schrieb Thomas Hellström (Intel):
>
> On 3/1/21 10:05 AM, Daniel Vetter wrote:
>> On Mon, Mar 01, 2021 at 09:39:53AM +0100, Thomas Hellström (Intel) 
>> wrote:
>>> Hi,
>>>
>>> On 3/1/21 9:28 AM, Daniel Vetter wrote:
>>>> On Sat, Feb 27, 2021 at 9:06 AM Thomas Hellström (Intel)
>>>> <thomas_os at shipmail.org> wrote:
>>>>> On 2/26/21 2:28 PM, Daniel Vetter wrote:
>>>>>> So I think it stops gup. But I haven't verified at all. Would be 
>>>>>> good
>>>>>> if Christian can check this with some direct io to a buffer in 
>>>>>> system
>>>>>> memory.
>>>>> Hmm,
>>>>>
>>>>> Docs (again vm_normal_page() say)
>>>>>
>>>>>     * VM_MIXEDMAP mappings can likewise contain memory with or 
>>>>> without "struct
>>>>>     * page" backing, however the difference is that _all_ pages 
>>>>> with a struct
>>>>>     * page (that is, those where pfn_valid is true) are refcounted 
>>>>> and
>>>>> considered
>>>>>     * normal pages by the VM. The disadvantage is that pages are 
>>>>> refcounted
>>>>>     * (which can be slower and simply not an option for some PFNMAP
>>>>> users). The
>>>>>     * advantage is that we don't have to follow the strict 
>>>>> linearity rule of
>>>>>     * PFNMAP mappings in order to support COWable mappings.
>>>>>
>>>>> but it's true __vm_insert_mixed() ends up in the insert_pfn() 
>>>>> path, so
>>>>> the above isn't really true, which makes me wonder if and in that 
>>>>> case
>>>>> why there could any longer ever be a significant performance 
>>>>> difference
>>>>> between MIXEDMAP and PFNMAP.
>>>> Yeah it's definitely confusing. I guess I'll hack up a patch and see
>>>> what sticks.
>>>>
>>>>> BTW regarding the TTM hugeptes, I don't think we ever landed that 
>>>>> devmap
>>>>> hack, so they are (for the non-gup case) relying on
>>>>> vma_is_special_huge(). For the gup case, I think the bug is still 
>>>>> there.
>>>> Maybe there's another devmap hack, but the ttm_vm_insert functions do
>>>> use PFN_DEV and all that. And I think that stops gup_fast from trying
>>>> to find the underlying page.
>>>> -Daniel
>>> Hmm perhaps it might, but I don't think so. The fix I tried out was 
>>> to set
>>>
>>> PFN_DEV | PFN_MAP for huge PTEs which causes pfn_devmap() to be 
>>> true, and
>>> then
>>>
>>> follow_devmap_pmd()->get_dev_pagemap() which returns NULL and 
>>> gup_fast()
>>> backs off,
>>>
>>> in the end that would mean setting in stone that "if there is a huge 
>>> devmap
>>> page table entry for which we haven't registered any devmap struct 
>>> pages
>>> (get_dev_pagemap returns NULL), we should treat that as a "special" 
>>> huge
>>> page table entry".
>>>
>>>  From what I can tell, all code calling get_dev_pagemap() already 
>>> does that,
>>> it's just a question of getting it accepted and formalizing it.
>> Oh I thought that's already how it works, since I didn't spot anything
>> else that would block gup_fast from falling over. I guess really would
>> need some testcases to make sure direct i/o (that's the easiest to test)
>> fails like we expect.
>
> Yeah, IIRC the "| PFN_MAP" is the missing piece for TTM huge ptes. 
> Otherwise pmd_devmap() will not return true and since there is no 
> pmd_special() things break.

Is that maybe the issue we have seen with amdgpu and huge pages?

Apart from that I'm lost guys, that devmap and gup stuff is not 
something I have a good knowledge of apart from a one mile high view.

Christian.

>
> /Thomas
>
>
>
>> -Daniel