[pulseaudio-discuss] [PATCH 09/11] pulsecore: memexport/memimport: Introduce memfd blocks support

Sun Sep 27 13:56:49 PDT 2015

Hi Tanu!

> >  /* Should be called locked */
> > -static pa_memimport_segment* segment_attach(pa_memimport *i, uint32_t shm_id, bool writable) {
> > +static pa_memimport_segment* segment_shm_attach(pa_memimport *i, uint32_t shm_id, bool writable) {
> >      pa_memimport_segment* seg;
> >  
> > -    if (pa_hashmap_size(i->segments) >= PA_MEMIMPORT_SEGMENTS_MAX)
> > +    if ((seg = pa_hashmap_get(i->shm_segments, PA_UINT32_TO_PTR(shm_id))))
> > +        return seg;
> > +
> > +    if (pa_hashmap_size(i->shm_segments) >= PA_MEMIMPORT_SEGMENTS_MAX)
> >          return NULL;
> >  
> >      seg = pa_xnew0(pa_memimport_segment, 1);
> >  
> > -    if (pa_shm_attach(&seg->memory, shm_id, writable) < 0) {
> > +    if (pa_shm_attach(&seg->per_type.shm, shm_id, writable) < 0) {
> >          pa_xfree(seg);
> >          return NULL;
> >      }
> >  
> > -    seg->writable = writable;
> > -    seg->import = i;
> > -    seg->trap = pa_memtrap_add(seg->memory.ptr, seg->memory.size);
> > +    segment_late_init(seg, i, PA_MEMORY_SHARED_POSIX, writable);
> > +
> > +    pa_hashmap_put(i->shm_segments, PA_UINT32_TO_PTR(seg->per_type.shm.id), seg);
> > +    return seg;
> > +}
> > +
> > +/* Should be called locked */
> > +static pa_memimport_segment* segment_memfd_attach(pa_memimport *i, int fd, bool writable) {
> > +    pa_memimport_segment* seg;
> >  
> > -    pa_hashmap_put(i->segments, PA_UINT32_TO_PTR(seg->memory.id), seg);
> > +    /* FIXME: Introduce a proper memfd-tracking mechanism. We receive
> > +     * different fd numbers even for the very same fds passed by the
> > +     * other endpoint. This can lead to an _unbounded_ increase in
> > +     * the number of `pa_memimport_segment' allocations created here. */
> > +/*  if (i->n_memfd_segments >= PA_MEMIMPORT_SEGMENTS_MAX)
> > +        return NULL; */
> 
> It's not very clear from the comment why the code is commented out.
> 
> After looking at pstream.c, I think the situation is this: when
> receiving memblocks over regular shm, the other side tells us the block
> id and the shm segment id. Normally there will be only one shm segment
> per client (and hence one segment per memimport object), right? I
> wonder then why we bother with segment ids at all in each memblock
> transmission, if there's only one segment anyway.
> PA_MEMIMPORT_SEGMENTS_MAX is 16, though, so maybe I'm wrong with the
> one segment per client assumption...
>

The other PA endpoint usually sends us a big number of different
memblocks residing inside the same shared memory area (file).

The receiving parts thus open and maps this SHM area once and uses
the SHM ID (the XXX in /dev/shm/pulse-shm-XXX files) as key. A single
"memimport_segment" is then created for all of the blocks received
originating from the same 64-MiB SHM area.

AFAIK, this puts a necessary cap on the receiving end virtual and
heap memory usage. The other endpoint can send us as many blocks as
it likes -- as long as they're from the same already open and mapped
SHM area.

But if it's going to "bother us" and require a new SHM area every
time it sends us a memblock, we'll open() and mmap() these areas
too, but up to a limit of 16 64-MiB unique SHM areas.

> Now the problem is that you tried to implement a similar scheme for
> memfds, with the fd as the segment id, which didn't work out so well.
> Even when there's only one segment, we actually receive a new fd for
> every memblock transmission, and attach to the same segment multiple
> times, easily exceeding PA_MEMIMPORT_SEGMENTS_MAX even with non
> -malicious clients. Is that correct?
>

Yes, file descriptors are recyclable by definition and cannot be used
as a reliable ID mechanism. That's why I've added the following comment
on top of pa_memimport::n_memfd_segments:

    /* Unlike what is done with Posix SHM segments above, we cannot
     * track memfd-based memimport segments using our ID (file
     * descriptor) as key. File descriptors are recyclable by nature;
     * the same fd number could map different memory regions at
     * different points of time.
     *
     * Moreover, even if the other endpoint sent us the _very same_
     * fd twice over a unix domain socket, the kernel will pass them
     * to us as different fd numbers.
     *
     * Thus only count the number of memfd segments allocated to this
     * memmimport and assure that this counter is zero upon release.
     */

> This looks like something that really should be fixed before merging
> the patch set.

I agree ;-) The patch series was just getting big enough I thought it
would be nice to gain some feedback and input at this stage.

> Do you already have an idea how to fix it? Could we send
> the fd just once during the connection set-up phase, and always use
> that fd from then on?
>

Yup.

We will emulate what the posix SHM code is currently doing, which
resembles what you're proposing above.

When a PA endpoint creates a memfd region, it attaches a randomly
generated ID to this memfd region. When a memblock inside this memfd
area is sent to the other end using pstreams, we also send this
random ID beside the memfd file descriptors.

This way, the receiving end (as done with posix SHM) will use a hash
table to store these ids. If the ID matches, we won't create a new
memfd "memimport_segment" and just use the existing one and its
already open and mapped file descriptor. If nothing matches, we'll
use the sent fds for opening and mapping, up to a max of 16.

This would also mirror what the SHM code does on the receiving end
(memblock.c segment_attach() in current stable/next, now renamed to
segment_shm_attach() in this patch series).

> ...
> 
> Ok, now that I look at the memexport changes, it looks like we can re
> -export an imported memblock. That explains why the segment id is
> needed.
>

Yup, the original code is quite generic.

We are not limited, per connection, to a single SHM area. Thus the
needs for unique IDs for such shared memory areas :-)

That is, we have no state: the segment IDs (basically SHM IDs) are
our HTTP-like cookies.

> 
> Phew, finally I'm done with this patch! Trying to understand the
> relationship with memimports, segments and pstream and their
> interactions took some effort. That makes me pretty impressed with your
> patches, which didn't seem to take that much time nor help to write,
> and I haven't found many issues beyond trivial coding style stuff :)
>

Thanks a lot for your reviews and kind words :-)

Honestly, the PA code, and PA itself, was very foreign at first. Thus
to build a coherent mental picture, I've followed up with the code and
wrote these documents here:

    https://a-darwish.gitbooks.io/diary-of-pulseaudio-developement/content/what_are_mempools.html
    https://a-darwish.gitbooks.io/diary-of-pulseaudio-developement/content/memory_chunks_and_memblockq.html
    https://a-darwish.gitbooks.io/diary-of-pulseaudio-developement/content/shm_file_sharing.html
    ..
    https://a-darwish.gitbooks.io/diary-of-pulseaudio-developement/content/logind_sessions_supportv2.html

Maybe one day we can polish these and transform them into a coherent
introductory guide for PulseAudio development ;-)

Cheers,

-- 
Darwish
http://darwish.chasingpointers.com