[PATCH 1/3] drm/radeon: stop poisoning the GART TLB

Christian König deathsimple at vodafone.de
Sun Jun 15 05:48:01 PDT 2014


Am 13.06.2014 23:31, schrieb Alex Deucher:
> On Fri, Jun 13, 2014 at 11:45 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Hi Marek,
>>
>> ah, yes! Piglit in combination with that patch can indeed crash the box.
>>
>> Going to investigate now that I can reproduce it.
> I wonder if it's a clockgating issue with the MC or BIF?  You might
> try adjusting the rdev->cg_flags (try setting it to 0) in
> radeon_asic.c or disabling dpm.

Unfortunately that was just a false alarm.

I was just on a branch which didn't had the "stop poisoning the GART 
TLB" patch, after applying this patch I can again let piglit run for the 
whole night without a lockup.

No idea what goes wrong when Marek runs piglit, but 3.15.0+"stop 
poisoning the GART TLB"+"force_gtt" is rock solid here.

Christian.

>
> Alex
>
>> Thanks,
>> Christian.
>>
>> Am 13.06.2014 15:19, schrieb Marek Olšák:
>>
>>> Hi,
>>>
>>> With my "force_gtt" patch, Cape Verde is unstable too, so all GCN
>>> chips are affected.
>>>
>>> I recommend applying that patch, because it will reproduce the problem
>>> faster. Without it, the hangs are very rare and it may take a while
>>> before they occur.
>>>
>>> Marek
>>>
>>> On Thu, Jun 12, 2014 at 1:23 PM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Please do so, and you might want to try 3.15.0 as well.
>>>>
>>>> I've tested multiple piglit runs over night with my Bonaire and 3.15.0
>>>> and
>>>> that seemed to work perfectly fine.
>>>>
>>>> Going to test Alex drm-next-3.16 a bit more as well.
>>>>
>>>> Christian.
>>>>
>>>> Am 11.06.2014 12:56, schrieb Marek Olšák:
>>>>
>>>>> I only tested Bonaire. I can test Cape Verde if needed.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Wed, Jun 11, 2014 at 11:29 AM, Christian König
>>>>> <deathsimple at vodafone.de> wrote:
>>>>>> Crap, I already wanted to check back with you if that really fixes your
>>>>>> problems.
>>>>>>
>>>>>> Thanks for the info, this crash also only happens on CIK doesn't it?
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.06.2014 01:30, schrieb Marek Olšák:
>>>>>>
>>>>>>> Sorry to tell you the bad news. This patch doesn't fix the hangs on my
>>>>>>> machine.
>>>>>>>
>>>>>>> I tested drm-next-3.16 from Alex's tree. I also switched copying from
>>>>>>> SDMA to CP DMA, which hung too.
>>>>>>>
>>>>>>> I also tried this:
>>>>>>>
>>>>>>> git checkout (the problematic commit):
>>>>>>> 6d2f294 - drm/radeon: use normal BOs for the page tables v4
>>>>>>>
>>>>>>> git cherry-pick (fixes):
>>>>>>> 0e97703c - drm/radeon: add define for flags used in R600+ GTT
>>>>>>> 0986c1a5 - drm/radeon: stop poisoning the GART TLB
>>>>>>> 4906f689 - drm/radeon: fix page directory update size estimation
>>>>>>> 4b095566 - drm/radeon: fix buffer placement under memory pressure v2
>>>>>>>
>>>>>>> Then I tested both SDMA and CP DMA copying. Both were unstable.
>>>>>>>
>>>>>>> Testing was done with piglit / quick.tests.
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 4, 2014 at 3:29 PM, Christian König
>>>>>>> <deathsimple at vodafone.de>
>>>>>>> wrote:
>>>>>>>> From: Christian König <christian.koenig at amd.com>
>>>>>>>>
>>>>>>>> When we set the valid bit on invalid GART entries they are
>>>>>>>> loaded into the TLB when an adjacent entry is loaded. This
>>>>>>>> poisons the TLB with invalid entries which are sometimes
>>>>>>>> not correctly removed on TLB flush.
>>>>>>>>
>>>>>>>> For stable inclusion the patch probably needs to be modified a bit.
>>>>>>>>
>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>> Cc: stable at vger.kernel.org
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/radeon/rs600.c | 5 ++++-
>>>>>>>>      1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/radeon/rs600.c
>>>>>>>> b/drivers/gpu/drm/radeon/rs600.c
>>>>>>>> index 0a8be63..e0465b2 100644
>>>>>>>> --- a/drivers/gpu/drm/radeon/rs600.c
>>>>>>>> +++ b/drivers/gpu/drm/radeon/rs600.c
>>>>>>>> @@ -634,7 +634,10 @@ int rs600_gart_set_page(struct radeon_device
>>>>>>>> *rdev,
>>>>>>>> int i, uint64_t addr)
>>>>>>>>                     return -EINVAL;
>>>>>>>>             }
>>>>>>>>             addr = addr & 0xFFFFFFFFFFFFF000ULL;
>>>>>>>> -       addr |= R600_PTE_GART;
>>>>>>>> +       if (addr == rdev->dummy_page.addr)
>>>>>>>> +               addr |= R600_PTE_SYSTEM | R600_PTE_SNOOPED;
>>>>>>>> +       else
>>>>>>>> +               addr |= R600_PTE_GART;
>>>>>>>>             writeq(addr, ptr + (i * 8));
>>>>>>>>             return 0;
>>>>>>>>      }
>>>>>>>> --
>>>>>>>> 1.9.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>



More information about the dri-devel mailing list