[pulseaudio-discuss] [PATCH 6/8] message-params: Allow parameter strings to contain escaped curly braces

Georg Chini georg at chini.tk
Sun Jul 22 14:02:54 UTC 2018


On 21.07.2018 20:17, Tanu Kaskinen wrote:
> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>> The patch adds the possibility to escape curly braces within parameter strings
>> and introduces several new functions that can be used for writing parameters.
>>
>> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
>> has been created. Following new write functions are available:
>>
>> pa_message_param_new() - creates a new pa_message_param structure
>> pa_message_param_to_string() - converts a pa_message_param to string and frees
>> the structure
> I'd like to have the _free suffix in the function name, because for the
> uninitiated the current name looks like just another "to_string"
> function with no surprising side effects.
>
> I think we need pa_message_params_free() as well in case an application
> runs into an error while constructing the parameters and it wants to
> just get rid of the pa_message_params struct without converting it to a
> string first.
>
>> pa_message_param_begin_list() - starts a list
>> pa_message_param_end_list() - ends a list
>> pa_message_param_write_string() - writes a string to a pa_message_param structure
>>
>> For string parameters that contain curly braces, those braces must be escaped
>> by adding a "\" before them. This however means that a trailing backslash would
>> falsely escape the closing bracket. To avoid this, single quotes must be added
>> at start and end of the string.
> Why not solve this by the usual method: require escaping of "\" in
> input with "\\" in output?

You are right, if I modify pa_unescape() as mentioned further below and then
use pa_escape()/pa_unescape() the quotes won't be necessary.

>
>> The function pa_message_param_write_string()
>> has a parameter do_escape.
> Why not do escaping always?

Because what you are writing as a string may be a list that you have 
prepared
previously. Then you will not want escaping. You may for example create 
a list
from an array and then insert this list as one string into the final 
parameter list
or have a function that converts a certain structure to a parameter 
string and
then write the result of this function as one element of the final list.

>
>> If true, the necessary escaping is added. Escaping
>> is only needed if a string might fulfill one of the following conditions:
>>
>> - It contains curly braces
>> - It contains a trailing "\"
>> - It is enclosed in single quotes
>>
>> Other strings can be passed without modification.
>>
>> For reading, pa_message_param_read_string() reverts the changes that
>> pa_message_param_write_string() might have introduced.
>>
>> The patch also adds more restrictions on the object path name. Now only
>> alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
>> The path name may not end with a /. If the user specifies a trailing / when
>> sending a message, it will be removed.
>> ---
>>   doc/messaging_api.txt           |  34 +++++++-
>>   src/Makefile.am                 |   3 +-
>>   src/map-file                    |   5 ++
>>   src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
>>   src/pulse/message-params.h      |  23 ++++-
>>   src/pulsecore/core.c            |  20 ++---
>>   src/pulsecore/message-handler.c |  53 +++++++-----
>>   src/utils/pactl.c               |   4 +-
>>   8 files changed, 279 insertions(+), 44 deletions(-)
>>
>> diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
>> index 431a5df2..0e6be53f 100644
>> --- a/doc/messaging_api.txt
>> +++ b/doc/messaging_api.txt
>> @@ -14,10 +14,42 @@ look like that:
>>   {{Integer} {{1st float} {2nd float} ...}}{...}
>>   Any characters that are not enclosed in curly braces are ignored (all characters
>>   between { and {, between } and } and between } and {). The same syntax is used
>> -to specify message parameters. The following reference lists available messages,
>> +to specify message parameters. The reference further down lists available messages,
>>   their parameters and return values. If a return value is enclosed in {}, this
>>   means that multiple elements of the same type may be returned.
>>   
>> +There are several functions that simplify reading and writing message parameter
>> +strings. For writing, the structure pa_message_param can be used. Following
>> +functions are available:
>> +pa_message_param_new() - creates a new pa_message_param structure
>> +pa_message_param_to_string() - converts a pa_message_param to string and frees
>> +the structure
>> +pa_message_param_begin_list() - starts a list
>> +pa_message_param_end_list() - ends a list
>> +pa_message_param_write_string() - writes a string to a pa_message_param structure
>> +
>> +For string parameters that contain curly braces, those braces must be escaped
>> +by adding a "\" before them. This however means that a trailing backslash would
>> +falsely escape the closing bracket. To avoid this, single quotes must be added
>> +at start and end of the string. The function pa_message_param_write_string()
>> +has a parameter do_escape. If true, the necessary escaping is added. Escaping
>> +is only needed if a string might fulfill one of the following conditions:
>> +
>> +- It contains curly braces
>> +- It contains a trailing "\"
>> +- It is enclosed in single quotes
>> +
>> +Other strings can be passed without modification.
>> +
>> +For reading, the following functions are available:
>> +pa_message_param_split_list() - parse message parameter string
>> +pa_message_param_read_string() - read a string from a parameter list
>> +
>> +pa_message_param_read_string() also reverts the changes that
>> +pa_message_param_write_string() might have introduced.
> I think the doxygen documentation is better suited for the details of
> the C API. messaging_api.txt should contain information that is common
> to both libpulse and pactl users.

Somehow it is not really clear to me, which documentation you want
to have in messaging_api.txt.

>
>> +/* Remove escaping from a string */
>> +static char *unescape(char *value) {
>> +    char *tmp;
>> +
>> +    if (!value)
>> +        return NULL;
>> +
>> +    /* If the string is enclosed in single quotes, remove them */
>> +    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
>> +
>> +        memmove(value, value + 1, strlen(value));
>> +        value[strlen(value) - 1] = 0;
>> +    }
>> +
>> +    /* Remove escape character from curly braces if present. */
>> +    while ((tmp = strstr(value, "\\{")))
>> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
>> +    while ((tmp = strstr(value, "\\}")))
>> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
>> +
>> +    /* Return the pointer that was passed in. */
>> +    return value;
>> +}
> core-util already contains pa_unescape() that does the same thing more
> efficiently (if you drop the single quote thing).

pa_unescape() currently does not do the same thing. It removes all
escape characters, while I only want to remove the characters
I actually introduced (those before { or }).
I can however modify pa_unescape() to take the same arguments
as pa_escape().

>
>> +
>> +/* Read functions */
>> +
>>   /* Split the specified string into elements. An element is defined as
>>    * a sub-string between curly braces. The function is needed to parse
>>    * the parameters of messages. Each time it is called returns the position
>>    * of the current element in result and the state pointer is advanced to
>> - * the next list element.
>> + * the next list element. On return, the parameter *is_unpacked indicates
>> + * if the string is plain text or contains a sub-list. is_unpacked may
>> + * be NULL.
> is_unpacked looks like unnecessary complexity.
> pa_message_params_read_string() should always unescape the value.

It may be possible, that the string you read is a list. Consider the 
following
parameter list: {string1}{some nested structure}{string2}. You can now
read this list as three strings and then continue to read the elements of
the nested structure from the second string. You might even create a 
function
that takes a string and outputs a structure. So you are not forced to go
to the full depth of nesting on the first pass. This makes it much easier
to handle deeply nested parameter lists. For me this behavior is an 
important
feature and I do not want to drop it. See also my comment on why I do
not always want escaping.

>
>>    
>>   
>>       /* Parse error, closing brace missing */
>> @@ -96,21 +154,126 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
>>   /* Read a string from the parameter list. The state pointer is
>>    * advanced to the next element of the list. If allocate is
>>    * true, a newly allocated string will be returned, else a
>> - * pointer to a sub-string within c. */
>> + * pointer to a sub-string within c. Escaping will be removed
>> + * if the string does not contain another list. */
> A string is a string, it's not a list. I think this function should
> always do unescaping. If a string value is read that is missing
> escaping for curly braces, that could be reported as an error (I'm not
> saying "should", but I think it would be good to do the validation in
> this function).

See my comments above why a string may be a list.

>
>> +
>> +/* Writes a string to a message_param structure, adding curly braces
>> + * around the string. If do_escape is true, leading and trailing single
>> + * quotes and also a "\" before curly braces are added to the input string.
>> + * Escaping only needs to be used if the original string might fulfill any
>> + * of the following conditions:
>> + * - It contains curly braces
>> + * - It is enclosed in single quotes
>> + * - It ends with a "\" */
>> +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {
> I would drop the do_escape flag, and if there's need for appending
> unescaped raw data to pa_message_params, there could be a separate
> function for that.

See above. A function for writing raw data should nevertheless be provided
for the case where a string can only be written as several parts.

>
>> +    const char *s;
>> +    char *output, *out_char;
>> +    size_t brace_count;
>> +
>> +    pa_assert(param);
>> +
>> +    /* Null value is written as empty element */
>> +    if (!value)
>> +        value = "";
>> +
>> +    if (!do_escape) {
>> +        pa_strbuf_printf(param->buffer, "{%s}", value);
>> +        return;
>> +    }
>> +
>> +    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
>> +     * the input string would cause the allocation of a linked list element
>> +     * for each character of the input string. Therefore the output string
>> +     * is constructed locally before writing it to the buffer */
> This is an interesting point. I think you should improve pa_escape(),
> which currently uses the naive pa_strbuf_putc() method. With my
> proposed escaping rules, you could then replace this code with a call
> to pa_escape().

OK, I'll do that and send a patch for tweaking pa_escape()/pa_unescape()
to my needs separately.

>
>> +
>> +    /* Count number of characters to escape */
>> +    brace_count = 0;
>> +    for (s = value; *s; ++s) {
>> +        if (*s == '{' || *s == '}')
>> +            brace_count++;
>> +    }
> You could at the same time count all characters to save the strlen()
> call.

Mh, do I really save something? If the string is x characters
long, I would increment a variable this often. Is that less work
than a strlen() call? But probably strlen() does something similar ...

>
>> +/* Check if a path string starts with a / and only contains valid characters */
>> +static bool object_path_is_valid(const char *test_string) {
>>       uint32_t i;
>>   
>> +    if (!test_string)
>> +        return false;
>> +
>> +    /* Make sure the string starts with / and does not end with a / */
>> +    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
>> +        return false;
>> +
>>       for (i = 0; test_string[i]; i++) {
>> -        if (test_string[i] == '{' ||
>> -            test_string[i] == '}')
>> -            return false;
>> +
>> +        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
>> +            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
>> +            (test_string[i] >= '0' && test_string[i] <= '9') ||
>> +            test_string[i] == '.' ||
>> +            test_string[i] == '_' ||
>> +            test_string[i] == '-' ||
>> +            test_string[i] == '/')
>> +            continue;
>> +
>> +        return false;
>>       }
> You could save the strlen() call if you checked the trailing slash
> after this for loop when you're at the end of the string.
> 	
> By the way, do you perhaps want to disallow two subsequent slashes in
> object paths? (I don't care myself, because I doubt that invalid paths
> will ever be encountered anyway when registering message handlers.)

If I don't disallow it, then a double slash in the path will be valid.
At least up to now we do not care about the individual components
that the path is constructed from.
So should I disallow it or not?

>
>>   
>>       return true;
>> @@ -56,13 +70,8 @@ void pa_message_handler_register(pa_core *c, const char *object_path, const char
>>       pa_assert(cb);
>>       pa_assert(userdata);
>>   
>> -    /* Ensure that the object path is not empty and starts with "/". */
>> -    pa_assert(object_path[0] == '/');
>> -
>> -    /* Ensure that object path and description are valid strings */
>> -    pa_assert(string_is_valid(object_path));
>> -    if (description)
>> -        pa_assert(string_is_valid(description));
>> +    /* Ensure that object path is valid */
>> +    pa_assert(object_path_is_valid(object_path));
>>   
>>       handler = pa_xnew0(struct pa_message_handler, 1);
>>       handler->userdata = userdata;
>> @@ -91,7 +100,7 @@ void pa_message_handler_unregister(pa_core *c, const char *object_path) {
>>   int pa_message_handler_send_message(pa_core *c, const char *object_path, const char *message, const char *message_parameters, char **response) {
>>       struct pa_message_handler *handler;
>>       int ret;
>> -    char *parameter_copy;
>> +    char *parameter_copy, *path_copy;
>>   
>>       pa_assert(c);
>>       pa_assert(object_path);
>> @@ -100,8 +109,16 @@ int pa_message_handler_send_message(pa_core *c, const char *object_path, const c
>>   
>>       *response = NULL;
>>   
>> -    if (!(handler = pa_hashmap_get(c->message_handlers, object_path)))
>> +    path_copy = pa_xstrdup(object_path);
>> +
>> +    /* Remove trailing / from path name if present */
>> +    if (path_copy[strlen(path_copy) - 1] == '/')
>> +        path_copy[strlen(path_copy) - 1] = 0;
> If you want to allow some slack for applications, would it be better to
> do this extra work in libpulse rather than in the daemon?
>
> I would personally not bother with this at all, but I don't mind it
> that much (even if it's done in the daemon).
>
If you don't mind, I would leave it as it is.


Unfortunately I am still heavily loaded, so it may take a while before
I can send new patches.



More information about the pulseaudio-discuss mailing list