two KASANs in TTM logic

Christian König christian.koenig at amd.com
Sat Sep 8 10:57:56 UTC 2018


Am 08.09.2018 um 12:40 schrieb Tom St Denis:
>
>
> On 09/08/2018 05:23 AM, Huang Rui wrote:
>> On Fri, Sep 07, 2018 at 04:59:11PM +0800, Christian König wrote:
>>> Hi Ray,
>>>
>>> in the meantime can we disable the feature once more in the kernel 
>>> until
>>> we have hammered out all possible corner cases?
>>
>> That's fine. So far, we have to disable it again. I will do more testing
>> and repro the issue of Tom firstly.
>>
>>>
>>> As Tom figured out commenting out setting "bulk_moveable" to true 
>>> should
>>> be enough.
>>
>> I saw you already remove the "bulk_moveable = true" in 
>> amdgpu_vm_init(), do
>> you point we also comment out the one in amdgpu_vm_move_to_lru_tail() to
>> disable bulk_move totally for the moment?
>
> Hi Ray,
>
> I just commented out the assignment of true.

Yeah, I think if we didn't figured out what is going wrong here by 
Monday we need to do this to prevent further bug reports.

Christian.

>
> Tom
>
>
>>
>> Thanks,
>> Ray
>>
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 07.09.2018 um 08:51 schrieb Huang, Ray:
>>>> Hi Tom,
>>>>
>>>> Thanks to trace this issue.  I am trying to reproduce it on 
>>>> amd-staging-drm-next with piglit.
>>>> May I know the steps/configurations to repro it?
>>>>
>>>> Thanks,
>>>> Ray
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of 
>>>> Tom St Denis
>>>> Sent: Wednesday, September 5, 2018 9:27 PM
>>>> To: Koenig, Christian <Christian.Koenig at amd.com>; Daenzer, Michel 
>>>> <Michel.Daenzer at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, 
>>>> Alexander <Alexander.Deucher at amd.com>
>>>> Subject: Re: two KASANs in TTM logic
>>>>
>>>> Logs attached.
>>>>
>>>> Tom
>>>>
>>>>
>>>>
>>>> On 09/05/2018 08:02 AM, Christian König wrote:
>>>>> Still not the slightest idea what is causing this and the patch
>>>>> definitely fixes things a lot.
>>>>>
>>>>> Can you try to enable list debugging in your kernel?
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> Am 04.09.2018 um 19:18 schrieb Tom St Denis:
>>>>>> Sure:
>>>>>>
>>>>>> d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
>>>>>> commit d2917f399e0b250f47d07da551a335843a24f835
>>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>>> Date:   Thu Aug 30 10:04:53 2018 +0200
>>>>>>
>>>>>>       drm/amdgpu: fix "use bulk moves for efficient VM LRU 
>>>>>> handling" v2
>>>>>>
>>>>>>       First step to fix the LRU corruption, we accidentially 
>>>>>> tried to
>>>>>> move things
>>>>>>       on the LRU after dropping the lock.
>>>>>>
>>>>>>       Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>       Tested-by: Michel Dänzer <michel.daenzer at amd.com>
>>>>>>
>>>>>> :040000 040000 ed5be1ad4da129c4154b2b43acf7ef349a470700
>>>>>> 0008c4e2fb56512f41559618dd474c916fc09a37 M      drivers
>>>>>>
>>>>>>
>>>>>> The commit before that I can run xonotic-glx and piglit on my 
>>>>>> Carrizo
>>>>>> without a KASAN.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> On 09/04/2018 10:05 AM, Christian König wrote:
>>>>>>> The first one should already be fixed.
>>>>>>>
>>>>>>> Not sure where the second comes from. Can you narrow that down 
>>>>>>> further?
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 04.09.2018 um 15:46 schrieb Tom St Denis:
>>>>>>>> First is caused by this commit while running a GL heavy 
>>>>>>>> application.
>>>>>>>>
>>>>>>>> d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
>>>>>>>> commit d78c1fa0c9f815fe951fd57001acca3d35262a17
>>>>>>>> Author: Michel Dänzer <michel.daenzer at amd.com>
>>>>>>>> Date:   Wed Aug 29 11:59:38 2018 +0200
>>>>>>>>
>>>>>>>>       Revert "drm/amdgpu: move PD/PT bos on LRU again"
>>>>>>>>
>>>>>>>>       This reverts commit 
>>>>>>>> 31625ccae4464b61ec8cdb9740df848bbc857a5b.
>>>>>>>>
>>>>>>>>       It triggered various badness on my development machine when
>>>>>>>> running the
>>>>>>>>       piglit gpu profile with radeonsi on Bonaire, looks like 
>>>>>>>> memory
>>>>>>>>       corruption due to insufficiently protected list 
>>>>>>>> manipulations.
>>>>>>>>
>>>>>>>>       Signed-off-by: Michel Dänzer <michel.daenzer at amd.com>
>>>>>>>>       Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>
>>>>>>>> :040000 040000 b7169f0cf0c7decec631751a9896a92badb67f9d
>>>>>>>> 42ea58f43199d26fc0c7ddcc655e6d0964b81817 M drivers
>>>>>>>>
>>>>>>>> The second is caused by something between that and the tip of the
>>>>>>>> 4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while
>>>>>>>> loading GNOME.
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>



More information about the amd-gfx mailing list