mm: fix cache mode tracking in vm_insert_mixed() breaks AMDGPU [was: Re: Latest testing with drm-next-4.9-wip and latest LLVM/mesa stack - Regression in PowerPlay/DPM on CIK?]

Dan Williams dan.j.williams at intel.com
Mon Oct 17 21:25:04 UTC 2016


On Sun, Oct 16, 2016 at 1:53 PM, Dave Airlie <airlied at gmail.com> wrote:
> On 17 October 2016 at 04:41, Marek Olšák <maraeo at gmail.com> wrote:
>> On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>
>>> [ Adding Dan Williams and dri-devel ]
>>>
>>> On 14/10/16 03:28 AM, Shawn Starr wrote:
>>>> Hello AMD folks,
>>>>
>>>> I have discovered a problem in Linus master that affects AMDGPU, nobody would
>>>> notice this in drm-next-4.9-wip since its not in this repo.
>>>
>>> [...]
>>>
>>>> 87744ab3832b83ba71b931f86f9cfdb000d07da5 is the first bad commit
>>>> commit 87744ab3832b83ba71b931f86f9cfdb000d07da5
>>>> Author: Dan Williams <dan.j.williams at intel.com>
>>>> Date:   Fri Oct 7 17:00:18 2016 -0700
>>>>
>>>>     mm: fix cache mode tracking in vm_insert_mixed()
>>>>
>>>>     vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
>>>>     fails to check the pgprot_t it uses for the mapping against the one
>>>>     recorded in the memtype tracking tree.  Add the missing call to
>>>>     track_pfn_insert() to preclude cases where incompatible aliased mappings
>>>>     are established for a given physical address range.
>>>>
>>>>     Link: http://lkml.kernel.org/r/
>>>> 147328717909.35069.14256589123570653697.stgit at dwillia2-
>>>> desk3.amr.corp.intel.com
>>>>     Signed-off-by: Dan Williams <dan.j.williams at intel.com>
>>>>     Cc: David Airlie <airlied at linux.ie>
>>>>     Cc: Matthew Wilcox <mawilcox at microsoft.com>
>>>>     Cc: Ross Zwisler <ross.zwisler at linux.intel.com>
>>>>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>>>     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>>>
>>>> :040000 040000 7517c0019fe49c1830b5a1d81f1dc099c5aab98a
>>>> fd497a604a2af5995db2b8ed1e9c640bede6adf3 M      mm
>>>>
>>>>
>>>> Removal of this patch stops graphics stalls.
>>>
>>> Thanks for bisecting this Shawn.
>>>
>>>
>>>> A friend of mine mentions,
>>>>
>>>> "looks like a graphics thingy you depend on is requesting a mapping with a
>>>> not-allowed cache mode, and now you are (rightfully) getting errors?"
>>>
>>> It would be nice to get some more specific pointers what amdgpu (or
>>> maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be
>>> doing wrong.
>
>        /*
>          * We'd like to use VM_PFNMAP on shared mappings, where
>          * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>          * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>          * bad for performance. Until that has been sorted out, use
>          * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>          */
>         vma->vm_flags |= VM_MIXEDMAP;
>
> We have that comment in the ttm code, which to me implies that mixed is
> doing the right thing now, but that is slow, as the interface we
> should be using.
>

Aren't there only 2 possibilities for this regression?

1/ a memtype entry was never made so track_pfn_insert() returns an
uncached mapping

2/ a conflicting memtype entry exists and undefined behavior due to
mixed mapping types is avoided with the change.


More information about the amd-gfx mailing list