<div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 28, 2019 at 11:13 AM Marc-André Lureau <<a href="mailto:marcandre.lureau@gmail.com" target="_blank">marcandre.lureau@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Eero!<br>
<br>
(ex-colleagues, long time ago!)<br>
<br>
On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen <<a href="mailto:eero.t.tamminen@intel.com" target="_blank">eero.t.tamminen@intel.com</a>> wrote:<br>
><br>
> Hi,<br>
><br>
> On 28.2.2019 11.57, Marc-André Lureau wrote:<br>
> > On Thu, Feb 28, 2019 at 1:17 AM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote:<br>
> >> I'd rather have something more robust than an env var, like catching SIGSYS.<br>
><br>
> SIGSYS is info for the invoking parent, not the (Mesa) process doing the<br>
> syscall.<br>
><br>
> From "man 2 seccomp":<br>
><br>
> The process terminates as though killed by a SIGSYS signal. Even if a<br>
> signal handler has been registered for SIGSYS, the handler will be<br>
> ignored in this case and the process always terminates. To a parent<br>
> process that is waiting on this process (using waitpid(2) or similar),<br>
> the returned wstatus will indicate that its child was terminated as<br>
> though by a SIGSYS signal.<br>
><br>
><br>
> > With current qemu in most distros, it defaults to SIGSYS (we switched<br>
> > away from SCMP_ACT_KILL, which had other problems). With more recent<br>
> > qemu/libseccomp, it will default to SCMP_ACT_KILL_PROCESS. In those<br>
> > KILL action cases, mesa will not be able to catch the failing<br>
> > syscalls.<br>
><br>
> Qemu / libvirt isn't the only thing using seccomp.<br>
><br>
> For example Docker enables seccomp filters (along with capability<br>
> restrictions) for the invoked containers unless that is explicitly<br>
> disabled:<br>
> <a href="https://docs.docker.com/engine/security/seccomp/" rel="noreferrer" target="_blank">https://docs.docker.com/engine/security/seccomp/</a><br>
><br>
> What actually gets filtered, is trivially changeable on Docker command<br>
> line by giving a JSON file specifying the syscall filtering.<br>
><br>
> Default policy seems to be white-listing affinity syscall:<br>
> <a href="https://github.com/moby/moby/blob/master/profiles/seccomp/default.json" rel="noreferrer" target="_blank">https://github.com/moby/moby/blob/master/profiles/seccomp/default.json</a><br>
><br>
><br>
> Why distro versions of Qemu filter sched_setaffinity() syscall?<br>
><br>
><br>
<br>
(<a href="https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889" rel="noreferrer" target="_blank">https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889</a>)<br>
<br>
Daniel Berrange (berrange) wrote on 2019-02-27: #19<br>
<br>
"IMHO that mesa change is not valid. It is settings its affinity to<br>
run on all threads which is definitely *NOT* something we want to be<br>
allowed. Management applications want to control which CPUs QEMU runs<br>
on, and as such Mesa should honour the CPU placement that the QEMU<br>
process has.<br>
<br>
This is a great example of why QEMU wants to use seccomp to block<br>
affinity changes to prevent something silently trying to use more CPUs<br>
than are assigned to this QEMU."<br></blockquote><div><br></div>Mesa uses thread affinity to optimize memory access performance on some CPUs (see util_pin_thread_to_L3). Other places in Mesa need to restore the original thread affinity for some child threads. Additionally, if games limit the thread affinity, Mesa needs to restore the full thread affinity for some of its child threads.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">In essence, the thread affinity should only be considered a hint for the kernel for optimal performance. There is no reason to kill the process if it's disallowed. Just ignore the call or modify the thread mask to make it legal.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">Marek<br></div></div></div>