mm: fix cache mode tracking in vm_insert_mixed() breaks AMDGPU [was: Re: Latest testing with drm-next-4.9-wip and latest LLVM/mesa stack - Regression in PowerPlay/DPM on CIK?]

Marek Olšák maraeo at gmail.com
Sun Oct 16 18:41:00 UTC 2016


On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer <michel at daenzer.net> wrote:
>
> [ Adding Dan Williams and dri-devel ]
>
> On 14/10/16 03:28 AM, Shawn Starr wrote:
>> Hello AMD folks,
>>
>> I have discovered a problem in Linus master that affects AMDGPU, nobody would
>> notice this in drm-next-4.9-wip since its not in this repo.
>
> [...]
>
>> 87744ab3832b83ba71b931f86f9cfdb000d07da5 is the first bad commit
>> commit 87744ab3832b83ba71b931f86f9cfdb000d07da5
>> Author: Dan Williams <dan.j.williams at intel.com>
>> Date:   Fri Oct 7 17:00:18 2016 -0700
>>
>>     mm: fix cache mode tracking in vm_insert_mixed()
>>
>>     vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
>>     fails to check the pgprot_t it uses for the mapping against the one
>>     recorded in the memtype tracking tree.  Add the missing call to
>>     track_pfn_insert() to preclude cases where incompatible aliased mappings
>>     are established for a given physical address range.
>>
>>     Link: http://lkml.kernel.org/r/
>> 147328717909.35069.14256589123570653697.stgit at dwillia2-
>> desk3.amr.corp.intel.com
>>     Signed-off-by: Dan Williams <dan.j.williams at intel.com>
>>     Cc: David Airlie <airlied at linux.ie>
>>     Cc: Matthew Wilcox <mawilcox at microsoft.com>
>>     Cc: Ross Zwisler <ross.zwisler at linux.intel.com>
>>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>
>> :040000 040000 7517c0019fe49c1830b5a1d81f1dc099c5aab98a
>> fd497a604a2af5995db2b8ed1e9c640bede6adf3 M      mm
>>
>>
>> Removal of this patch stops graphics stalls.
>
> Thanks for bisecting this Shawn.
>
>
>> A friend of mine mentions,
>>
>> "looks like a graphics thingy you depend on is requesting a mapping with a
>> not-allowed cache mode, and now you are (rightfully) getting errors?"
>
> It would be nice to get some more specific pointers what amdgpu (or
> maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be
> doing wrong.

BTW, people have reported that rendering stalls every time TTM tries
to move a buffer, even if the move is only a few MB.

See FPS and num_bytes_moved here:
https://i.imgur.com/kNj2vqF.png

There are 5 big stalls. 4 of them are due to the mm commit.

Marek


More information about the dri-devel mailing list