mm: fix cache mode tracking in vm_insert_mixed() breaks AMDGPU [was: Re: Latest testing with drm-next-4.9-wip and latest LLVM/mesa stack - Regression in PowerPlay/DPM on CIK?]

Dave Airlie airlied at gmail.com
Mon Oct 17 22:01:01 UTC 2016


On 18 October 2016 at 07:25, Dan Williams <dan.j.williams at intel.com> wrote:
> On Sun, Oct 16, 2016 at 1:53 PM, Dave Airlie <airlied at gmail.com> wrote:
>> On 17 October 2016 at 04:41, Marek Olšák <maraeo at gmail.com> wrote:
>>> On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>
>>>> [ Adding Dan Williams and dri-devel ]
>>>>
>>>> On 14/10/16 03:28 AM, Shawn Starr wrote:
>>>>> Hello AMD folks,
>>>>>
>>>>> I have discovered a problem in Linus master that affects AMDGPU, nobody would
>>>>> notice this in drm-next-4.9-wip since its not in this repo.
>>>>
>>>> [...]
>>>>
>>>>> 87744ab3832b83ba71b931f86f9cfdb000d07da5 is the first bad commit
>>>>> commit 87744ab3832b83ba71b931f86f9cfdb000d07da5
>>>>> Author: Dan Williams <dan.j.williams at intel.com>
>>>>> Date:   Fri Oct 7 17:00:18 2016 -0700
>>>>>
>>>>>     mm: fix cache mode tracking in vm_insert_mixed()
>>>>>
>>>>>     vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
>>>>>     fails to check the pgprot_t it uses for the mapping against the one
>>>>>     recorded in the memtype tracking tree.  Add the missing call to
>>>>>     track_pfn_insert() to preclude cases where incompatible aliased mappings
>>>>>     are established for a given physical address range.
>>>>>
>>>>>     Link: http://lkml.kernel.org/r/
>>>>> 147328717909.35069.14256589123570653697.stgit at dwillia2-
>>>>> desk3.amr.corp.intel.com
>>>>>     Signed-off-by: Dan Williams <dan.j.williams at intel.com>
>>>>>     Cc: David Airlie <airlied at linux.ie>
>>>>>     Cc: Matthew Wilcox <mawilcox at microsoft.com>
>>>>>     Cc: Ross Zwisler <ross.zwisler at linux.intel.com>
>>>>>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>>>>     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>>>>
>>>>> :040000 040000 7517c0019fe49c1830b5a1d81f1dc099c5aab98a
>>>>> fd497a604a2af5995db2b8ed1e9c640bede6adf3 M      mm
>>>>>
>>>>>
>>>>> Removal of this patch stops graphics stalls.
>>>>
>>>> Thanks for bisecting this Shawn.
>>>>
>>>>
>>>>> A friend of mine mentions,
>>>>>
>>>>> "looks like a graphics thingy you depend on is requesting a mapping with a
>>>>> not-allowed cache mode, and now you are (rightfully) getting errors?"
>>>>
>>>> It would be nice to get some more specific pointers what amdgpu (or
>>>> maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be
>>>> doing wrong.
>>
>>        /*
>>          * We'd like to use VM_PFNMAP on shared mappings, where
>>          * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>>          * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>>          * bad for performance. Until that has been sorted out, use
>>          * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>>          */
>>         vma->vm_flags |= VM_MIXEDMAP;
>>
>> We have that comment in the ttm code, which to me implies that mixed is
>> doing the right thing now, but that is slow, as the interface we
>> should be using.
>>
>
> Aren't there only 2 possibilities for this regression?
>
> 1/ a memtype entry was never made so track_pfn_insert() returns an
> uncached mapping
>
> 2/ a conflicting memtype entry exists and undefined behavior due to
> mixed mapping types is avoided with the change.

3/ The CPU usage through this path goes up, and slows things down,
though I suspect you it's more an uncached mapping showing up
when we don't expect it.

Dave.


More information about the amd-gfx mailing list