mm: fix cache mode tracking in vm_insert_mixed() breaks AMDGPU [was: Re: Latest testing with drm-next-4.9-wip and latest LLVM/mesa stack - Regression in PowerPlay/DPM on CIK?]
Dave Airlie
airlied at gmail.com
Mon Oct 17 22:01:01 UTC 2016
On 18 October 2016 at 07:25, Dan Williams <dan.j.williams at intel.com> wrote:
> On Sun, Oct 16, 2016 at 1:53 PM, Dave Airlie <airlied at gmail.com> wrote:
>> On 17 October 2016 at 04:41, Marek Olšák <maraeo at gmail.com> wrote:
>>> On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>
>>>> [ Adding Dan Williams and dri-devel ]
>>>>
>>>> On 14/10/16 03:28 AM, Shawn Starr wrote:
>>>>> Hello AMD folks,
>>>>>
>>>>> I have discovered a problem in Linus master that affects AMDGPU, nobody would
>>>>> notice this in drm-next-4.9-wip since its not in this repo.
>>>>
>>>> [...]
>>>>
>>>>> 87744ab3832b83ba71b931f86f9cfdb000d07da5 is the first bad commit
>>>>> commit 87744ab3832b83ba71b931f86f9cfdb000d07da5
>>>>> Author: Dan Williams <dan.j.williams at intel.com>
>>>>> Date: Fri Oct 7 17:00:18 2016 -0700
>>>>>
>>>>> mm: fix cache mode tracking in vm_insert_mixed()
>>>>>
>>>>> vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
>>>>> fails to check the pgprot_t it uses for the mapping against the one
>>>>> recorded in the memtype tracking tree. Add the missing call to
>>>>> track_pfn_insert() to preclude cases where incompatible aliased mappings
>>>>> are established for a given physical address range.
>>>>>
>>>>> Link: http://lkml.kernel.org/r/
>>>>> 147328717909.35069.14256589123570653697.stgit at dwillia2-
>>>>> desk3.amr.corp.intel.com
>>>>> Signed-off-by: Dan Williams <dan.j.williams at intel.com>
>>>>> Cc: David Airlie <airlied at linux.ie>
>>>>> Cc: Matthew Wilcox <mawilcox at microsoft.com>
>>>>> Cc: Ross Zwisler <ross.zwisler at linux.intel.com>
>>>>> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>>>> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>>>>
>>>>> :040000 040000 7517c0019fe49c1830b5a1d81f1dc099c5aab98a
>>>>> fd497a604a2af5995db2b8ed1e9c640bede6adf3 M mm
>>>>>
>>>>>
>>>>> Removal of this patch stops graphics stalls.
>>>>
>>>> Thanks for bisecting this Shawn.
>>>>
>>>>
>>>>> A friend of mine mentions,
>>>>>
>>>>> "looks like a graphics thingy you depend on is requesting a mapping with a
>>>>> not-allowed cache mode, and now you are (rightfully) getting errors?"
>>>>
>>>> It would be nice to get some more specific pointers what amdgpu (or
>>>> maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be
>>>> doing wrong.
>>
>> /*
>> * We'd like to use VM_PFNMAP on shared mappings, where
>> * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>> * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>> * bad for performance. Until that has been sorted out, use
>> * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>> */
>> vma->vm_flags |= VM_MIXEDMAP;
>>
>> We have that comment in the ttm code, which to me implies that mixed is
>> doing the right thing now, but that is slow, as the interface we
>> should be using.
>>
>
> Aren't there only 2 possibilities for this regression?
>
> 1/ a memtype entry was never made so track_pfn_insert() returns an
> uncached mapping
>
> 2/ a conflicting memtype entry exists and undefined behavior due to
> mixed mapping types is avoided with the change.
3/ The CPU usage through this path goes up, and slows things down,
though I suspect you it's more an uncached mapping showing up
when we don't expect it.
Dave.
More information about the amd-gfx
mailing list