[Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering
Eero Tamminen
eero.t.tamminen at intel.com
Thu Feb 28 12:50:37 UTC 2019
Hi,
On 28.2.2019 11.57, Marc-André Lureau wrote:
> On Thu, Feb 28, 2019 at 1:17 AM Marek Olšák <maraeo at gmail.com> wrote:
>> I'd rather have something more robust than an env var, like catching SIGSYS.
SIGSYS is info for the invoking parent, not the (Mesa) process doing the
syscall.
From "man 2 seccomp":
The process terminates as though killed by a SIGSYS signal. Even if a
signal handler has been registered for SIGSYS, the handler will be
ignored in this case and the process always terminates. To a parent
process that is waiting on this process (using waitpid(2) or similar),
the returned wstatus will indicate that its child was terminated as
though by a SIGSYS signal.
> With current qemu in most distros, it defaults to SIGSYS (we switched
> away from SCMP_ACT_KILL, which had other problems). With more recent
> qemu/libseccomp, it will default to SCMP_ACT_KILL_PROCESS. In those
> KILL action cases, mesa will not be able to catch the failing
> syscalls.
Qemu / libvirt isn't the only thing using seccomp.
For example Docker enables seccomp filters (along with capability
restrictions) for the invoked containers unless that is explicitly
disabled:
https://docs.docker.com/engine/security/seccomp/
What actually gets filtered, is trivially changeable on Docker command
line by giving a JSON file specifying the syscall filtering.
Default policy seems to be white-listing affinity syscall:
https://github.com/moby/moby/blob/master/profiles/seccomp/default.json
Why distro versions of Qemu filter sched_setaffinity() syscall?
- Eero
>> Marek
>>
>> On Wed, Feb 27, 2019 at 6:13 PM <marcandre.lureau at redhat.com> wrote:
>>>
>>> From: Marc-André Lureau <marcandre.lureau at redhat.com>
>>>
>>> Since commit d877451b48a59ab0f9a4210fc736f51da5851c9a ("util/u_queue:
>>> add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY"), mesa calls
>>> sched_setaffinity syscall. Unfortunately, qemu crashes with SIGSYS
>>> when sandboxing is enabled (by default with libvirt), as this syscall
>>> is filtered.
>>>
>>> There doesn't seem to be a way to check for the seccomp rule other
>>> than doing a call, which may result in various behaviour depending on
>>> seccomp actions. There is a PTRACE_SECCOMP_GET_FILTER, but it is
>>> low-level and a priviledged operation (but there might be a way to use
>>> it?). A safe way would be to try the call in a subprocess,
>>> unfortunately, qemu also prohibits fork(). Also this could be subject
>>> to TOCTOU.
>>>
>>> There seems to be few solutions, but the issue can be considered a
>>> regression for various libvirt/Boxes users.
>>>
>>> Introduce MESA_NO_THREAD_AFFINITY environment variable to prevent the
>>> offending call. Wrap pthread_setaffinity_np() in a utility function
>>> u_pthread_setaffinity_np(), returning a EACCESS error if the variable
>>> is set.
>>>
>>> Note: one call is left with a FIXME, as I didn't investigate how to
>>> build and test it, help welcome!
>>>
>>> See also:
>>> https://bugs.freedesktop.org/show_bug.cgi?id=109695
>>>
>>> Signed-off-by: Marc-André Lureau <marcandre.lureau at redhat.com>
>>> ---
>>> .../drivers/swr/rasterizer/core/threads.cpp | 1 +
>>> src/util/u_queue.c | 2 +-
>>> src/util/u_thread.h | 15 ++++++++++++++-
>>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/swr/rasterizer/core/threads.cpp b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
>>> index e30c1170568..d10c79512a1 100644
>>> --- a/src/gallium/drivers/swr/rasterizer/core/threads.cpp
>>> +++ b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
>>> @@ -364,6 +364,7 @@ void bindThread(SWR_CONTEXT* pContext,
>>> CPU_ZERO(&cpuset);
>>> CPU_SET(threadId, &cpuset);
>>>
>>> + /* FIXME: use u_pthread_setaffinity_np() if possible */
>>> int err = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
>>> if (err != 0)
>>> {
>>> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
>>> index 3812c824b6d..dea8d2bb4ae 100644
>>> --- a/src/util/u_queue.c
>>> +++ b/src/util/u_queue.c
>>> @@ -249,7 +249,7 @@ util_queue_thread_func(void *input)
>>> for (unsigned i = 0; i < CPU_SETSIZE; i++)
>>> CPU_SET(i, &cpuset);
>>>
>>> - pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
>>> + u_pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
>>> }
>>> #endif
>>>
>>> diff --git a/src/util/u_thread.h b/src/util/u_thread.h
>>> index a46c18d3db2..a4e6dbae5d7 100644
>>> --- a/src/util/u_thread.h
>>> +++ b/src/util/u_thread.h
>>> @@ -70,6 +70,19 @@ static inline void u_thread_setname( const char *name )
>>> (void)name;
>>> }
>>>
>>> +#if defined(HAVE_PTHREAD_SETAFFINITY)
>>> +static inline int u_pthread_setaffinity_np(pthread_t thread, size_t cpusetsize,
>>> + const cpu_set_t *cpuset)
>>> +{
>>> + if (getenv("MESA_NO_THREAD_AFFINITY")) {
>>> + errno = EACCES;
>>> + return -1;
>>> + }
>>> +
>>> + return pthread_setaffinity_np(thread, cpusetsize, cpuset);
>>> +}
>>> +#endif
>>> +
>>> /**
>>> * An AMD Zen CPU consists of multiple modules where each module has its own L3
>>> * cache. Inter-thread communication such as locks and atomics between modules
>>> @@ -89,7 +102,7 @@ util_pin_thread_to_L3(thrd_t thread, unsigned L3_index, unsigned cores_per_L3)
>>> CPU_ZERO(&cpuset);
>>> for (unsigned i = 0; i < cores_per_L3; i++)
>>> CPU_SET(L3_index * cores_per_L3 + i, &cpuset);
>>> - pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
>>> + u_pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
>>> #endif
>>> }
>>>
>>> --
>>> 2.21.0
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
More information about the mesa-dev
mailing list