[pulseaudio-discuss] [PATCH 5/5] pactl: Implement list message-handlers

Wed Jan 17 15:12:21 UTC 2018

On 14.01.2018 21:47, Tanu Kaskinen wrote:
> On Fri, 2018-01-12 at 21:47 +0100, Georg Chini wrote:
>> On 12.01.2018 16:40, Tanu Kaskinen wrote:
>>> On Sun, 2017-10-29 at 20:51 +0100, Georg Chini wrote:
>>>> For better readability, "pactl list message-handlers" is introduced which
>>>> prints a formatted output of "pactl send-message /core list-handlers".
>>>>
>>>> The patch also adds the function pa_split_message_response() for easy
>>>> parsing of the message response string.
>>>> ---
>>>>
>>>>
>>> I would also prefer more fine-grained functions than just one single
>>> split function. When parsing lists, this split function will
>>> unnecessarily malloc the list before mallocing the individual list
>>> elements.
>> The top level is an implicit list (see my other mail), so a new string
>> will only be allocated if a list is passed as an element of another list.
>>
>> I agree with you that there should be functions for more complex
>> structures, but in this special case it does not make sense to me
>> because parsing is simple.
>> I also think that the low level function does exactly what it should
>> do. It allows you to recursively resolve the string and the additional
>> few bytes of memory should not be an issue. Consider a more
>> complex case. A message returning some top level variables, a
>> simple list and a list of structures which themselves contain simple
>> lists like that:
>>
>> {top_var1} {top_var2} {{l1}{l2} ...} {{{struct1_var1}{struct1_var2}
>> {{struct1_l1}{struct1_l2} ...} {struct1_var3}}{{struct2_var1}{struct2_var2}
>> {{struct2_l1}{struct2_l2} ...} {struct2_var3}} ...} {top_var3}
>>
>> The first pass would return the following strings:
>> top_var1
>>
>> top_var2
>>
>> {l1}{l2} ...
>>
>> {{struct1_var1}{struct1_var2}{{struct1_l1}{struct1_l2} ...} {struct1_var3}}
>> {{struct2_var1}{struct2_var2}{{struct2_l1}{struct2_l2} ...}
>> {struct2_var3}} ...
>>
>> top_var3
>>
>> So top level is resolved. The simple list can be resolved in the 2nd pass.
>> The field of structures will be split in single structures during the 2nd
>> pass like that:
>>
>> {struct1_var1}{struct1_var2}{{struct1_l1}{struct1_l2} ...} {struct1_var3}
>>
>> {struct2_var1}{struct2_var2}{{struct2_l1}{struct2_l2} ...} {struct2_var3}
>>
>> ...
>>
>> Now we can have a function which parses the struct and uses the
>> low level function again twice to do so.
>>
>> I think we should introduce those high level parsing functions whenever
>> they are first needed. For me, functions like
>> string_to_array_of_{int|float|...}()
>> or string_to_struct_xyz() would make sense.
> We should have specialized parsing functions also for the simple cases,
> like reading a single int or float. We need to specify exactly how e.g.
> floats are serialized, and we shouldn't burden every client with the
> responsibility of implementing our float serialization rules. Not only
> is it more work for them, the likelihood of not following our spec
> exactly is very high.

I am not against having such parsing functions. You are right,
they could avoid allocating and de-allocating a string when a
number or a boolean is retrieved. But they are not needed for
this patch because it only deals with strings.

>
> Returning to the topic of mallocs: allocating and freeing memory from
> the heap are quite heavy operations, and your approach will do a lot of
> those operations. Even reading a simple integer will, I suppose,
> involve first allocating a new string. I can't say for sure that the
> performance penalty makes any practical difference, but my intuition
> says that we should use the approach that I proposed that requires far
> fewer heap memory allocations.

I can accept the argument that my solution unnecessarily allocates
strings in some situations. Though I do not think it has much practical
impact, most of it can be avoided by using special functions to retrieve
numbers or booleans. I am also quite sure that a function similar to the
*_split_in_place() functions can be implemented to completely avoid
the allocation of unnecessary strings.

>
>>> I already provided an example about how I'd like the parsing to work.
>>> Apparently you didn't like that for some reason? I'll copy the example
>>> here again:
>>>
>>> const char *state = message_parameters;
>>>
>>> if (pa_message_read_start_list(&state) < 0)
>>>       fail("Expected a list.");
>>>
>>> while (pa_message_read_end_list(&state) < 0) {
>>>       char *path;
>>>       char *description;
>>>
>>>       if (pa_message_read_path(&state, &path) < 0)
>>>           fail("Expected a path.");
>>>       if (pa_message_read_string(&state, &description) < 0)
>>>           fail("Expected a string.")
>>>
>>>       /* Do something with path and description. */
>>>
>>>       pa_xfree(description);
>>>       pa_xfree(path);
>>> }
>> As said above, I would prefer the single parsing function until
>> the need for more complex functions arises. We should not be
>> too fine grained, this will lead to confusion which function must
>> be used when.
> Can you elaborate, what risk of confusion do you see?

 From my point of view, your solution has several issues.

1) The example above is not extensible. What is pa_message_read_end_list()
supposed to do? Skip to the next element in the list indicated by { and 
return
zero if } is encountered instead? If yes, this would not work if you add 
an item
to the inner list, because at the next iteration it would go to the 
additional
element instead of the next list. On the other hand, If the function is 
supposed
to skip to the end of a list, you cannot use it to iterate over list 
elements. To
make your solution extensible you would at least need three functions:

pa_message_find_list_start() - find the beginning of a list
pa_message_find_list_end() - skip to the end of a list
pa_message_find_list_element() - find the next element in a list

All three functions need to handle escaping, nesting and error checking. 
Your
loop would look something like

pa_message_find_list_start(&state);
while (pa_message_find_list_element(&state)) {
      pa_message_find_list_start(&state);
      read path
      read description
      pa_message_find_list_end(&state);
}

This looks rather complicated compared to

while ((element = pa_split_message_response(response, &state))) {
     read path
     read description
     pa_xfree(element);
}

and I wonder if the additional overhead does not outweigh the allocation
and de-allocation of a string. Additionally, your approach might easily lead
to problems, because, if the find_list_end() call within the loop is 
forgotten,
it will work with the original message response but will break on extension.

2) You are forced to process the elements in the order they are delivered,
taking care of all the nesting. This has two disadvantages:

a) Because you do not know if the nesting brackets are closed correctly, you
may continue processing elements after a missing bracket. Consider the
following example: {{x1} {x2} {x3}} {y1} {y2}. If you forget to close the
list like {{x1} {x2} {x3} {y1} {y2}, your approach would continue processing
y1 and y2 as part of the list before the error is encountered.

split_message_response() however would complain before any of the elements
is processed because it searches for the closing bracket and thereby does a
consistency check of each element that is returned.

b) If you have nested lists, you need to handle them in nested loops 
like this:

pa_message_find_list_start(&state);
while (pa_message_find_list_element(&state)) {

      pa_message_find_list_start(&state);
while (pa_message_find_list_element(&state)) {
            ...
     }
}

With my solution it would look like:

outer_list = pa_split_message_response(response, &state);

while ((inner_list = pa_split_message_response(outer_list, &state2))) {
     ...
}

which again is somewhat simpler and better to read especially if more
complex structures are involved. As said above it has the additional
advantage that outer_list is already checked for consistency before the
while-loop is entered.

c) For your solution to work correctly, outermost brackets would be required
while they are optional with my code. This is not important, but I thought I
mention it anyway. If my approach will be followed, the big advantage of
adding outer brackets is that the initial split_message_response() call 
would
do a consistency check of the whole string before anything is processed.

3) To answer your original question, in your example it is a bit confusing
that there is a read_path() and a read_description() function. Both are
strings, so why have different functions? The functions we are adding
should not be too specialized.

>
> At this point I'm not convinced by your reasoning, and I feel pretty
> strongly about the parsing design. If you want to continue with the
> "single split function" approach, getting Arun on your side is likely
> going to be needed for me to give up on this.
>
I also feel pretty strongly about my approach, so if the arguments
above do not convince you, we should involve Arun. As said above,
I don't mind having more specialized split functions, but I think they
need not be part of this patch.