[Mesa-dev] [PATCH 1/2] radeonsi: set a per-buffer flag that disables inter-process sharing (v4)
Michel Dänzer
michel at daenzer.net
Fri Sep 8 08:28:23 UTC 2017
On 07/09/17 07:24 PM, Christian König wrote:
> Am 07.09.2017 um 12:14 schrieb Marek Olšák:
>> On Sep 7, 2017 12:08 PM, "Christian König" <deathsimple at vodafone.de
>> <mailto:deathsimple at vodafone.de>> wrote:
>> Am 07.09.2017 um 11:23 schrieb Michel Dänzer:
>> On 01/09/17 07:40 PM, Christian König wrote:
>> Am 01.09.2017 um 12:28 schrieb Michel Dänzer:
>> On 01/09/17 07:23 PM, Nicolai Hähnle wrote:
>> On 01.09.2017 11:58, Michel Dänzer wrote:
>> On 29/08/17 11:47 PM, Christian König wrote:
>>
>> From: Marek Olšák <marek.olsak at amd.com
>> <mailto:marek.olsak at amd.com>>
>>
>> For lower overhead in the CS ioctl.
>> Winsys allocators are not used with
>> interprocess-sharable resources.
>>
>> v2: It shouldn't crash anymore, but the
>> kernel will reject the new
>> flag.
>> v3 (christian): Rename the flag, avoid
>> sending those buffers in the
>> BO list.
>> v4 (christian): Remove setting the kernel
>> flag for now
>>
>> This change seems to have caused a GPU hang
>> when running piglit on my
>> Kaveri with the radeon kernel driver.
>>
>> I think we can remove "seems to have". I'm still reliably
>> getting the
>> GPUVM fault and hang with current master, but not if I revert this
>> commit (and the one after it).
>>
>> Haven't been able to isolate it to a specific
>> test, seems to only
>> happen when running multiple tests concurrently.
>>
>> I reproduced the problem with piglit process separation
>> enabled as well,
>> and all four tests running when it hung were textureGather tests.
>> Before, reproducing the problem twice with piglit process
>> separation
>> disabled, three textureGather tests were running when it hung
>> both times
>> as well. I've been unable to reproduce the problem by manually
>> running
>> the same textureGather tests in parallel though.
>>
>>
>> There's a GPUVM fault before the hang, I
>> suspect it's related:
>>
>> radeon 0000:00:01.0: GPU fault detected: 146
>> 0x0ae6760c
>> radeon 0000:00:01.0:
>> VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000001D7
>> radeon 0000:00:01.0:
>> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0607600C
>> VM fault (0x0c, vmid 3) at page 471, read from
>> 'CPF' (0x43504600) (118)
>>
>>
>> Any ideas?
>>
>> Not the slightest, but I'm still investigating problems
>> with that on
>> amdgpu.
>>
>> If we can't find the root cause till Monday it might be a
>> good idea to
>> revert the patches for now.
>>
>> What's the status on that?
>>
>>
>>
>> I've found and fixed the remaining kernel bugs over the last
>> weekend/beginning of this week.
>>
>> Still need to commit the fix for UVD/VCE, but that one shouldn't
>> affect GFX at all.
>>
>>
>> Michel is seeing hangs on the radeon KMD, which should be unaffected
>> by you kernel work I think.
>>
>> We could revert this to unbreak Michel's Kaveri,
FWIW, there's no need to do anything for my Kaveri development system in
particular; it's going out of service soon, and in the meantime I can
revert these changes locally.
My concern is that the underlying issue might cause other problems in
real world scenarios.
>> but I think it shouldn't be so difficult to find the culprit in this
>> patch if there is one.
>
> The only crux is that the userspace patch shouldn't affect radeon at
> all. So the real question is what the heck is going on here?
Maybe some buffers that were previously allocated directly are now
sub-allocated or re-used from the BO cache, or vice versa, or something
like that?
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
More information about the mesa-dev
mailing list