<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - radeonsi cache format changed, causes mesa crash on startup"
href="https://bugs.freedesktop.org/show_bug.cgi?id=109007">109007</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>radeonsi cache format changed, causes mesa crash on startup
</td>
</tr>
<tr>
<th>Product</th>
<td>Mesa
</td>
</tr>
<tr>
<th>Version</th>
<td>git
</td>
</tr>
<tr>
<th>Hardware</th>
<td>Other
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>medium
</td>
</tr>
<tr>
<th>Component</th>
<td>Drivers/Gallium/radeonsi
</td>
</tr>
<tr>
<th>Assignee</th>
<td>dri-devel@lists.freedesktop.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>dan@reactivated.net
</td>
</tr>
<tr>
<th>QA Contact</th>
<td>dri-devel@lists.freedesktop.org
</td>
</tr></table>
<p>
<div>
<pre>Having upgraded from Mesa-17.3 to Mesa-18.1 in Endless OS, many users on
AMD-based platforms are now reporting that the system fails to boot into the
UI. I've reproduced and confirm that Xorg is crashing very early on.
Thread 4 "si_shader:0" received signal SIGSEGV, Segmentation fault.
__memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1533
Backtrace:
#0 __memcpy_ssse3_back ()
at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1533
#1 0x00007fffeeba2038 in memcpy (__len=3221880836, __src=0x7fffe4000e70,
__dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 read_data (size=3221880836, data=<optimized out>, ptr=0x7fffe4000e70)
at ../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:95
#3 read_chunk (ptr=0x7fffe4000e70, ptr@entry=0x7fffe4000e6c,
data=data@entry=0x7fffe4000998, size=size@entry=0x7fffe4000980)
at ../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:121
#4 0x00007fffeeba21b3 in si_load_shader_binary (
shader=shader@entry=0x7fffe40008c0, binary=binary@entry=0x7fffe4000e00)
at ../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:187
#5 0x00007fffeeba4810 in si_shader_cache_load_shader (shader=0x7fffe40008c0,
ir_binary=0x7fffe4000a50, sscreen=0x555555a393a0)
at ../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:275
#6 si_init_shader_selector_async (job=job@entry=0x555555b8dfa0,
thread_index=thread_index@entry=0)
at ../../../../../src/gallium/drivers/radeonsi/si_state_shaders.c:1875
#7 0x00007fffee747a55 in util_queue_thread_func (
input=input@entry=0x555555a39fb0) at ../../../src/util/u_queue.c:271
#8 0x00007fffee7476c7 in impl_thrd_routine (p=<optimized out>)
at ../../../include/c11/threads_posix.h:87
#9 0x00007ffff574d494 in start_thread (arg=0x7fffebe06700)
The problem here is that the on-disk radeonsi cache format changed without
consideration for this in the code. The affected codepath is
si_load_shader_binary() which does:
uint32_t size = *ptr++;
uint32_t crc32 = *ptr++;
[...]
ptr = read_data(ptr, &shader->config, sizeof(shader->config));
ptr = read_data(ptr, &shader->info, sizeof(shader->info));
ptr = read_chunk(ptr, (void**)&shader->binary.code,
&shader->binary.code_size);
So, the blob format is: 4 bytes size, 4 bytes CRC, shader config, shader info,
code.
In mesa-17.3 the si_shader_config was 48 bytes in size, but in Mesa-18.1 and
current master, si_shader_config is 52 bytes in size, because the max_simd_wave
field was added.
After upgrading mesa to 18.1, with shaders compiled and cached by mesa-17.3,
now the above code will obviously not behave as intended. We enter into
read_chunk() with the offsets slightly wrong:
*size = *ptr++;
assert(*data == NULL);
if (!*size)
return ptr;
*data = malloc(*size);
return read_data(ptr, *data, *size);
and when this code executes, *size has value 3221880836, for a shader that was
only 884 bytes uncompressed. read_data then tries to memcpy this much data, and
that causes the crash.
In addition to the lack of invalidation of existing disk caches after the
on-disk format was changed, this code also seems rather suspect in that it does
not verify that it is not reading beyond the end of the shader. As an attacker
I could maliciously rewrite the size field read by the read_chunk() code above
to be very large, fixup the CRC and recompress, and then I could cause other
apps to crash in this way.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>