SDMA out-of-bounds write access of tiled surface (was: Re: [amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714)
Nicolai Hähnle
nhaehnle at gmail.com
Wed Jun 22 07:50:10 UTC 2016
Hi Mads,
setting R600_DEBUG=nodma in the X server should work around your problem
for now.
Marek, perhaps an out-of-bounds check for tiled texture memory access
similar to the linear access check is necessary? I wonder if you've seen
something about that in the docs.
I've annotated the sDMA IB dump. It's a linear-to-display-tiled copy on
Carrizo. I tried to reproduce with the attached patch, but failed to do
so even with amdgpu.vm_debug=1. With the patch, I get DMA copies that
are identical to the one that causes the VM fault except for a different
bank_height and macro_tile_aspect, so the issue is likely related to those.
Nicolai
On 21.06.2016 19:32, Nicolai Hähnle wrote:
> On 21.06.2016 19:16, Mads wrote:
>> I sent this for 1.5 hours ago, but since it hasn't arrived to the
>> mailing list yet, I try again...
>
> It arrived, no worries :)
>
> I'll take a look later.
>
> Nicolai
>
>>
>> On 2016-06-21 17:48, Mads wrote:
>>
>>> On 2016-06-21 10:12, Mads wrote:
>>>
>>> On 2016-06-21 09:39, Nicolai Hähnle wrote:
>>>
>>> Thanks. However, I still don't think this is going to help. Your
>>> earlier trace experiments showed that the problematic SDMA commands
>>> came from the X server, _not_ from plasmashell.
>>>
>>> So what we see here is likely just the first set of GPU commands sent
>>> by plasmashell after the VM fault occurred. Since the plasmashell
>>> process is unable to tell who caused the VM fault, it takes the blame
>>> incorrectly. Are you sure the X server is using your self-compiled
>>> radeonsi_dri.so and has the environment variable set? If it creates a
>>> ddebug_dump, it might be somewhere else (it's based off the HOME
>>> environment variable, which may be different).
>>> I'll take a second look to see if there's an X dump there too, but
>>> unfortunately it'll be in about ~8 hours before I have the machine at
>>> hand again..
>>>
>>> And yes, I'm sure, everything is built through portage, so there is no
>>> "self-compiled" on the system per se. There's always just one lib
>>> available at any time :)
>>
>> You were right! X didn't have R600_DEBUG=check_vm in environment (no
>> login shell/sourcing of /etc/profile).
>>
>> Here's what i ran:
>>
>>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin
>>> libGL: pci id for fd 9: 1002:9874, driver radeonsi
>>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so
>>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so
>>> si_vm_fault_occured: failed to parse line ' Either
>>> enable ECC checking or force module loading by setting
>>> 'ecc_enable_override'.
>>> '
>>> libGL: Using DRI3 for screen 0
>>> Trying to convert empty KLocalizedString to QString.
>>> Cannot creat accessible child interface for object:
>>> PlacesView(0x118d670) index: 5
>>> QPixmap::scaled: Pixmap is a null pixmap
>>> QPixmap::scaled: Pixmap is a null pixmap
>>> (... etc ...)
>>> The X11 connection broke (error 1). Did the X11 server die?
>>
>> Attaching dmesg and ddebug_dump.
>>
>> - Mads
-------------- next part --------------
VM fault report.
Driver vendor: X.Org
Device vendor: AMD
Device name: AMD CARRIZO (DRM 3.1.0 / 4.6.2-gentoo, LLVM 3.9.0)
Failing VM page: 0x00101508
Buffer list (in units of pages = 4kB):
[1;33m Size VM start page VM end page Usage[0m
8 0x0000000100035 0x000000010003d IB1
843 -- hole --
975 0x0000000100388 0x0000000100757 SDMA_BUFFER
2473 -- hole --
1032 0x0000000101100 0x0000000101508 SDMA_BUFFER
Note: The holes represent memory not used by the IB.
Other buffers can still be allocated there.
------------------ sDMA IB begin ------------------
00000501 COPY, TILED_SUB_WINDOW
01100000 tiled_address_lo
00000001 tiled_address_hi
001d0000 tiled_x = 0, tiled_y = 29
00ab0000 tiled_z = 0, pitch_tile_max = 0xab = 171
0000407f slice_tile_max = 0x407f = 16511
02481822
00388000 linear_address_lo
00000001 linear_address_hi
00000000 linear_x = 0, linear_y = 0
057f0000 linear_z = 0, linear_pitch = 0x580 = 1408
000f3b7f linear_slice_pitch = 0xf3b80 = 998272
02c40555 copy_width_aligned = 0x556 = 1366, copy_height = 709
00000000 copy_depth = 1
00000000 NOP
------------------- sDMA IB end -------------------
linear_height = 709
log(bpe) = 2, bpe = 4
array_mode = 4 (ARRAY_2D_TILED_THIN1)
micro_tile_mode = 0 (DISPLAY_MICRO_TILING)
log(tile_split) = 3
bank_width = 0
bank_height = 2
num_banks = 2
macro_tile_aspect = 2
pipe_config = 0
tiled_pitch = 172 * 8 = 1376
tiled_slice_pitch = 16512 * 64 = 1056768
-> tiled_height = 768
My Carrizo: tile bits 01401822
bank_height = 0
num_banks = 2
macro_tile_aspect = 1
SDMA Dump Done.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reproduction-attempt.patch
Type: text/x-patch
Size: 3783 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160622/c49aae43/attachment.bin>
More information about the amd-gfx
mailing list