[pulseaudio-discuss] [PATCH 5/5] pactl: Implement list message-handlers

Thu Jan 18 22:20:13 UTC 2018

On Wed, 2018-01-17 at 16:12 +0100, Georg Chini wrote:
> On 14.01.2018 21:47, Tanu Kaskinen wrote:ry from
> > the heap are quite heavy operations, and your approach will do a lot of
> > those operations. Even reading a simple integer will, I suppose,
> > involve first allocating a new string. I can't say for sure that the
> > performance penalty makes any practical difference, but my intuition
> > says that we should use the approach that I proposed that requires far
> > fewer heap memory allocations.
> 
> I can accept the argument that my solution unnecessarily allocates
> strings in some situations. Though I do not think it has much practical
> impact, most of it can be avoided by using special functions to retrieve
> numbers or booleans. I am also quite sure that a function similar to the
> *_split_in_place() functions can be implemented to completely avoid
> the allocation of unnecessary strings.

An in-place split function seems feasible indeed.

> > 
> > > > I already provided an example about how I'd like the parsing to work.
> > > > Apparently you didn't like that for some reason? I'll copy the example
> > > > here again:
> > > > 
> > > > const char *state = message_parameters;
> > > > 
> > > > if (pa_message_read_start_list(&state) < 0)
> > > >       fail("Expected a list.");
> > > > 
> > > > while (pa_message_read_end_list(&state) < 0) {
> > > >       char *path;
> > > >       char *description;
> > > > 
> > > >       if (pa_message_read_path(&state, &path) < 0)
> > > >           fail("Expected a path.");
> > > >       if (pa_message_read_string(&state, &description) < 0)
> > > >           fail("Expected a string.")
> > > > 
> > > >       /* Do something with path and description. */
> > > > 
> > > >       pa_xfree(description);
> > > >       pa_xfree(path);
> > > > }
> > > 
> > > As said above, I would prefer the single parsing function until
> > > the need for more complex functions arises. We should not be
> > > too fine grained, this will lead to confusion which function must
> > > be used when.
> > 
> > Can you elaborate, what risk of confusion do you see?
> 
>  From my point of view, your solution has several issues.
> 
> 1) The example above is not extensible. What is pa_message_read_end_list()
> supposed to do? Skip to the next element in the list indicated by { and 
> return
> zero if } is encountered instead? If yes, this would not work if you add 
> an item
> to the inner list, because at the next iteration it would go to the 
> additional
> element instead of the next list. On the other hand, If the function is 
> supposed
> to skip to the end of a list, you cannot use it to iterate over list 
> elements.

pa_message_read_end_list() is supposed to consume the closing brace
(and any random characters, since those are now allowed), nothing else.
If it encounters an opening brace or end-of-string, it's supposed to
fail.

My example does not work with the current list-handlers message, that's
true.

> To
> make your solution extensible you would at least need three functions:
> 
> pa_message_find_list_start() - find the beginning of a list
> pa_message_find_list_end() - skip to the end of a list
> pa_message_find_list_element() - find the next element in a list
> 
> All three functions need to handle escaping, nesting and error checking.

I don't know why you added "find" to the function names. You're
probably guessing wrong the semantics I intended to have with
pa_message_list_start() and pa_message_list_end().

pa_message_read_list_start() should only consume the next opening brace
and any whitespace preceding it (by "whitespace" I mean any ignorable
characters). If it encounters a closing brace or end-of-string, it
should fail. There's no need for escaping or nesting handling.

pa_message_read_list_end() is explained above.

A "skip-to-end" function is indeed needed, and it will have to handle
escaping and nesting.

Here's an updated example that should work with the current list-
handlers (with an "implicit" top-level list):

while (pa_message_read_list_start(&state) > 0) {
    read path
    read description
    pa_message_skip_rest_of_list(&state) /* Consumes the list closing brace and everything preceding it. */
}

if (!pa_message_eof(&state)) {
    /* This can happen if an unexpected closing brace is encountered. */
    fail("Malformed data");
}

As you can see, the updated example is exactly the same length as your
proposal, except for the pa_message_eof() check that you didn't have,
but you should have a similar check, because otherwise you don't know
if the parsing loop ended because of end-of-string or malformed input.

> Your
> loop would look something like
> 
> pa_message_find_list_start(&state);
> while (pa_message_find_list_element(&state)) {
>       pa_message_find_list_start(&state);
>       read path
>       read description
>       pa_message_find_list_end(&state);
> }
> 
> This looks rather complicated compared to
> 
> while ((element = pa_split_message_response(response, &state))) {
>      read path
>      read description
>      pa_xfree(element);
> }
> 
> and I wonder if the additional overhead does not outweigh the allocation
> and de-allocation of a string. Additionally, your approach might easily lead
> to problems, because, if the find_list_end() call within the loop is 
> forgotten,
> it will work with the original message response but will break on extension.

My updated example won't work with the original message response,
because pa_message_skip_rest_of_list() is required to consume the list
closing brace. If that is forgotten, then pa_message_read_list_start()
will fail, the loop will terminate, and the !pa_message_eof() will
trigger failure.

> 
> 2) You are forced to process the elements in the order they are delivered,
> taking care of all the nesting. This has two disadvantages:
> 
> a) Because you do not know if the nesting brackets are closed correctly, you
> may continue processing elements after a missing bracket. Consider the
> following example: {{x1} {x2} {x3}} {y1} {y2}. If you forget to close the
> list like {{x1} {x2} {x3} {y1} {y2}, your approach would continue processing
> y1 and y2 as part of the list before the error is encountered.
> 
> split_message_response() however would complain before any of the elements
> is processed because it searches for the closing bracket and thereby does a
> consistency check of each element that is returned.

You're right, but I don't see why this is such a bad thing. The error
will be caught eventually.

> b) If you have nested lists, you need to handle them in nested loops 
> like this:
> 
> pa_message_find_list_start(&state);
> while (pa_message_find_list_element(&state)) {
> 
>       pa_message_find_list_start(&state);
> while (pa_message_find_list_element(&state)) {
>             ...
>      }
> }
> 
> With my solution it would look like:
> 
> outer_list = pa_split_message_response(response, &state);
> 
> while ((inner_list = pa_split_message_response(outer_list, &state2))) {
>      ...
> }
> 
> which again is somewhat simpler and better to read especially if more
> complex structures are involved. As said above it has the additional
> advantage that outer_list is already checked for consistency before the
> while-loop is entered.

I don't understand. What data are we parsing? Why does the first
example need nested loops but the second doesn't? If the inner list is
an array that needs iteration, then you should loop in the code that
you replaced with "..." in the second example. If the inner list is a
structure like what we have to deal with list-handlers, then the first
example doesn't need the inner loop, it can read the struct members in
the outer loop.

> c) For your solution to work correctly, outermost brackets would be required
> while they are optional with my code. This is not important, but I thought I
> mention it anyway. If my approach will be followed, the big advantage of
> adding outer brackets is that the initial split_message_response() call 
> would
> do a consistency check of the whole string before anything is processed.

My approach can work just fine with an implicit top-level list, since
that's what the code example that I gave in this mail does.

> 3) To answer your original question, in your example it is a bit confusing
> that there is a read_path() and a read_description() function. Both are
> strings, so why have different functions? The functions we are adding
> should not be too specialized.

The idea behind read_path() is that if object paths have constraints on
what is a valid value, then read_path() can do validation that plain
read_string() can't.

I hope you understand my approach better now. I don't konw if that
makes it any easier to accept it, though.

There's another topic we should discuss: how should the writing API
look like? This can have some effect on the parsing API as well. When
writing data to the parameter string, the whole string can't be easily
allocated all in one go, so we should have something like pa_strbuf
available for clients. I don't think we should add pa_strbuf at least
directly to the client API, because the message parameter writing has
some specific conventions (i.e. wrapping everything in curly braces),
and while it would be possible to add the specialized functions to the
pa_strbuf API, I think that would be somewhat ugly. So I propose we
introduce a new API similar to pa_strbuf.

My suggestion would be to introduce a new pa_message struct that has
both reading and writing functions (e.g. pa_message_read_string(),
pa_message_write_string()). It would be analogous to DBusMessage, so it
would do a bit more than just string manipulation (it would at least
carry the recipient/sender and message name information). I'm not sure
you like that, because I've got the impression that you want to call
only the "method calls" messages, and in that case replies and signals
can not be called messages. Nevertheless, my preference is to call all
of those three things messages, and in that case it makes sense to have
pa_message that represents all three things, so that's what I'm going
to propose here.

pa_context_send_message() would take a pa_message as an argument. The
reply callback would get a pa_message as an argument. Signal callbacks
would get a pa_message as an argument. So things would work very
similarly to how D-Bus works. One major difference to D-Bus would be
that errors wouldn't be messages, at least for now (earlier there was
some talk about making errors more informative, and at that point using
pa_message for representing errors as well might make sense).

If you're much against this altered terminology, then I'm open to other
naming suggestions for the message parameter writing API. Note that
while clients only need to write parameters for "method call" messages,
the server should be able to use the API for writing reply and signal
parameters as well, which makes it more difficult to come up with a
good naming scheme, if "message" can only mean method call messages.

-- 
Tanu

https://liberapay.com/tanuk
https://www.patreon.com/tanuk