[PATCH 1/3] drm/radeon: stop poisoning the GART TLB

Christian König christian.koenig at amd.com
Sun Jun 29 03:34:50 PDT 2014


Am 27.06.2014 10:59, schrieb Michel Dänzer:
> On 27.06.2014 17:26, Christian König wrote:
>> Am 27.06.2014 04:31, schrieb Michel Dänzer:
>>> On 25.06.2014 12:59, Michel Dänzer wrote:
>>>> With these patches, 3.15 just survived two piglit runs on my Bonaire,
>>>> one with the GART poisoning fix and one without. It never survived a
>>>> single run before.
>>>>
>>>> Acked-and-Tested-by: Michel Dänzer <michel.daenzer at amd.com>
>>> So, are these patches going to 3.16 and 3.15?
>> We could send them in for 3.15,
> What's the alternative for 3.15?

Well, figuring out what's the real reason behind those lockups would be 
a good start :)

> Looks like e.g. https://bugs.freedesktop.org/show_bug.cgi?id=80141 is
> confirmed to be this.
>
>
>> but for 3.16 we have some new features that depend on the new code.
>>
>> We could backport them to the old code, but I really want to work on
>> figuring out what's wrong with the new approach instead.
>>
>> Going to prepare a branch for you to test over the weekend, would be
>> nice if you could give it a try on Monday and see if that fixes the
>> issues as well.
> Sure, will do.

I've just pushed the branch testing-3.15 to 
git://people.freedesktop.org/~deathsimple/linux. It's based on 3.15.2 
and contains the "stop poisoning the GART TLB" patch backported to 3.15 
and a couple of things that I would like to try.

I've disabled the redirection of page faults to the dummy page for now 
and so the system should lockup on the first page fault it encounters. 
Apart from that the page directory and page tables are now completely 
over allocated and over aligned.

Setting the READABLE bit on invalid entries shouldn't have an effect 
other than making those entries non zero. So please try to lockup your 
bonaire with this branch and as soon as you encounter the first page 
fault take a look at VM_CONTEXT1_PROTECTION_FAULT_STATUS and figure out 
which VMID caused the lockup.

Then use the attached script to make a dump from the complete page 
directory and page table of the VMID in question. E.g. "./dump_vm.sh 1" 
if the lockup was caused by VMID 1 etc... Make sure you've got a 
radeontool that supports CIK, otherwise it would only return zeros as 
page directory address.

Since even the invalid page table entries should now have at least the 
READABLE bit set there shouldn't be anything zero in this dump and look 
out for anything else suspicious as well (0xdeadbeef etc...).

Thanks for the help,
Christian.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_vm.sh
Type: application/x-shellscript
Size: 562 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140629/45d28bc6/attachment-0001.bin>


More information about the dri-devel mailing list