[pulseaudio-discuss] [PATCH 6/8] message-params: Allow parameter strings to contain escaped curly braces

Sun Jul 22 15:48:41 UTC 2018

On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > The patch adds the possibility to escape curly braces within parameter strings
> > > and introduces several new functions that can be used for writing parameters.
> > > 
> > > For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
> > > has been created. Following new write functions are available:
> > > 
> > > pa_message_param_new() - creates a new pa_message_param structure
> > > pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > the structure
> > 
> > I'd like to have the _free suffix in the function name, because for the
> > uninitiated the current name looks like just another "to_string"
> > function with no surprising side effects.
> > 
> > I think we need pa_message_params_free() as well in case an application
> > runs into an error while constructing the parameters and it wants to
> > just get rid of the pa_message_params struct without converting it to a
> > string first.
> > 
> > > pa_message_param_begin_list() - starts a list
> > > pa_message_param_end_list() - ends a list
> > > pa_message_param_write_string() - writes a string to a pa_message_param structure
> > > 
> > > For string parameters that contain curly braces, those braces must be escaped
> > > by adding a "\" before them. This however means that a trailing backslash would
> > > falsely escape the closing bracket. To avoid this, single quotes must be added
> > > at start and end of the string.
> > 
> > Why not solve this by the usual method: require escaping of "\" in
> > input with "\\" in output?
> 
> You are right, if I modify pa_unescape() as mentioned further below and then
> use pa_escape()/pa_unescape() the quotes won't be necessary.
> 
> > 
> > > The function pa_message_param_write_string()
> > > has a parameter do_escape.
> > 
> > Why not do escaping always?
> 
> Because what you are writing as a string may be a list that you have 
> prepared
> previously. Then you will not want escaping. You may for example create 
> a list
> from an array and then insert this list as one string into the final 
> parameter list
> or have a function that converts a certain structure to a parameter 
> string and
> then write the result of this function as one element of the final list.

My mental model is that parameters have types, list type is different
than string type, and write_string() is only meant for writing values
of the string type.

Can you add a write_raw() function?

> > 
> > > If true, the necessary escaping is added. Escaping
> > > is only needed if a string might fulfill one of the following conditions:
> > > 
> > > - It contains curly braces
> > > - It contains a trailing "\"
> > > - It is enclosed in single quotes
> > > 
> > > Other strings can be passed without modification.
> > > 
> > > For reading, pa_message_param_read_string() reverts the changes that
> > > pa_message_param_write_string() might have introduced.
> > > 
> > > The patch also adds more restrictions on the object path name. Now only
> > > alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
> > > The path name may not end with a /. If the user specifies a trailing / when
> > > sending a message, it will be removed.
> > > ---
> > >   doc/messaging_api.txt           |  34 +++++++-
> > >   src/Makefile.am                 |   3 +-
> > >   src/map-file                    |   5 ++
> > >   src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
> > >   src/pulse/message-params.h      |  23 ++++-
> > >   src/pulsecore/core.c            |  20 ++---
> > >   src/pulsecore/message-handler.c |  53 +++++++-----
> > >   src/utils/pactl.c               |   4 +-
> > >   8 files changed, 279 insertions(+), 44 deletions(-)
> > > 
> > > diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
> > > index 431a5df2..0e6be53f 100644
> > > --- a/doc/messaging_api.txt
> > > +++ b/doc/messaging_api.txt
> > > @@ -14,10 +14,42 @@ look like that:
> > >   {{Integer} {{1st float} {2nd float} ...}}{...}
> > >   Any characters that are not enclosed in curly braces are ignored (all characters
> > >   between { and {, between } and } and between } and {). The same syntax is used
> > > -to specify message parameters. The following reference lists available messages,
> > > +to specify message parameters. The reference further down lists available messages,
> > >   their parameters and return values. If a return value is enclosed in {}, this
> > >   means that multiple elements of the same type may be returned.
> > >   
> > > +There are several functions that simplify reading and writing message parameter
> > > +strings. For writing, the structure pa_message_param can be used. Following
> > > +functions are available:
> > > +pa_message_param_new() - creates a new pa_message_param structure
> > > +pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > +the structure
> > > +pa_message_param_begin_list() - starts a list
> > > +pa_message_param_end_list() - ends a list
> > > +pa_message_param_write_string() - writes a string to a pa_message_param structure
> > > +
> > > +For string parameters that contain curly braces, those braces must be escaped
> > > +by adding a "\" before them. This however means that a trailing backslash would
> > > +falsely escape the closing bracket. To avoid this, single quotes must be added
> > > +at start and end of the string. The function pa_message_param_write_string()
> > > +has a parameter do_escape. If true, the necessary escaping is added. Escaping
> > > +is only needed if a string might fulfill one of the following conditions:
> > > +
> > > +- It contains curly braces
> > > +- It contains a trailing "\"
> > > +- It is enclosed in single quotes
> > > +
> > > +Other strings can be passed without modification.
> > > +
> > > +For reading, the following functions are available:
> > > +pa_message_param_split_list() - parse message parameter string
> > > +pa_message_param_read_string() - read a string from a parameter list
> > > +
> > > +pa_message_param_read_string() also reverts the changes that
> > > +pa_message_param_write_string() might have introduced.
> > 
> > I think the doxygen documentation is better suited for the details of
> > the C API. messaging_api.txt should contain information that is common
> > to both libpulse and pactl users.
> 
> Somehow it is not really clear to me, which documentation you want
> to have in messaging_api.txt.

I want documentation for two things: the serialization format
specification and the documentation for the supported messages (plus a
short introduction explaining why the messaging API exists). When
writing C programs, the doxygen documentation is the reference for the
C API details.

> > 
> > > +/* Remove escaping from a string */
> > > +static char *unescape(char *value) {
> > > +    char *tmp;
> > > +
> > > +    if (!value)
> > > +        return NULL;
> > > +
> > > +    /* If the string is enclosed in single quotes, remove them */
> > > +    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
> > > +
> > > +        memmove(value, value + 1, strlen(value));
> > > +        value[strlen(value) - 1] = 0;
> > > +    }
> > > +
> > > +    /* Remove escape character from curly braces if present. */
> > > +    while ((tmp = strstr(value, "\\{")))
> > > +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> > > +    while ((tmp = strstr(value, "\\}")))
> > > +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> > > +
> > > +    /* Return the pointer that was passed in. */
> > > +    return value;
> > > +}
> > 
> > core-util already contains pa_unescape() that does the same thing more
> > efficiently (if you drop the single quote thing).
> 
> pa_unescape() currently does not do the same thing. It removes all
> escape characters, while I only want to remove the characters
> I actually introduced (those before { or }).
> I can however modify pa_unescape() to take the same arguments
> as pa_escape().

I don't see the need for being selective when unescaping. Nothing
breaks if all (except escaped) backslashes are stripped.

> > 
> > > +
> > > +/* Read functions */
> > > +
> > >   /* Split the specified string into elements. An element is defined as
> > >    * a sub-string between curly braces. The function is needed to parse
> > >    * the parameters of messages. Each time it is called returns the position
> > >    * of the current element in result and the state pointer is advanced to
> > > - * the next list element.
> > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > + * be NULL.
> > 
> > is_unpacked looks like unnecessary complexity.
> > pa_message_params_read_string() should always unescape the value.
> 
> It may be possible, that the string you read is a list. Consider the 
> following
> parameter list: {string1}{some nested structure}{string2}. You can now
> read this list as three strings and then continue to read the elements of
> the nested structure from the second string. You might even create a 
> function
> that takes a string and outputs a structure. So you are not forced to go
> to the full depth of nesting on the first pass. This makes it much easier
> to handle deeply nested parameter lists. For me this behavior is an 
> important
> feature and I do not want to drop it. See also my comment on why I do
> not always want escaping.

Doesn't split_list() already allow this, why do you want to use
read_string() to do the same thing as split_list()?

> > 
> > >    
> > >   
> > >       /* Parse error, closing brace missing */
> > > @@ -96,21 +154,126 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
> > >   /* Read a string from the parameter list. The state pointer is
> > >    * advanced to the next element of the list. If allocate is
> > >    * true, a newly allocated string will be returned, else a
> > > - * pointer to a sub-string within c. */
> > > + * pointer to a sub-string within c. Escaping will be removed
> > > + * if the string does not contain another list. */
> > 
> > A string is a string, it's not a list. I think this function should
> > always do unescaping. If a string value is read that is missing
> > escaping for curly braces, that could be reported as an error (I'm not
> > saying "should", but I think it would be good to do the validation in
> > this function).
> 
> See my comments above why a string may be a list.
> 
> > 
> > > +
> > > +/* Writes a string to a message_param structure, adding curly braces
> > > + * around the string. If do_escape is true, leading and trailing single
> > > + * quotes and also a "\" before curly braces are added to the input string.
> > > + * Escaping only needs to be used if the original string might fulfill any
> > > + * of the following conditions:
> > > + * - It contains curly braces
> > > + * - It is enclosed in single quotes
> > > + * - It ends with a "\" */
> > > +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {
> > 
> > I would drop the do_escape flag, and if there's need for appending
> > unescaped raw data to pa_message_params, there could be a separate
> > function for that.
> 
> See above. A function for writing raw data should nevertheless be provided
> for the case where a string can only be written as several parts.
> 
> > 
> > > +    const char *s;
> > > +    char *output, *out_char;
> > > +    size_t brace_count;
> > > +
> > > +    pa_assert(param);
> > > +
> > > +    /* Null value is written as empty element */
> > > +    if (!value)
> > > +        value = "";
> > > +
> > > +    if (!do_escape) {
> > > +        pa_strbuf_printf(param->buffer, "{%s}", value);
> > > +        return;
> > > +    }
> > > +
> > > +    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
> > > +     * the input string would cause the allocation of a linked list element
> > > +     * for each character of the input string. Therefore the output string
> > > +     * is constructed locally before writing it to the buffer */
> > 
> > This is an interesting point. I think you should improve pa_escape(),
> > which currently uses the naive pa_strbuf_putc() method. With my
> > proposed escaping rules, you could then replace this code with a call
> > to pa_escape().
> 
> OK, I'll do that and send a patch for tweaking pa_escape()/pa_unescape()
> to my needs separately.
> 
> > 
> > > +
> > > +    /* Count number of characters to escape */
> > > +    brace_count = 0;
> > > +    for (s = value; *s; ++s) {
> > > +        if (*s == '{' || *s == '}')
> > > +            brace_count++;
> > > +    }
> > 
> > You could at the same time count all characters to save the strlen()
> > call.
> 
> Mh, do I really save something? If the string is x characters
> long, I would increment a variable this often. Is that less work
> than a strlen() call? But probably strlen() does something similar ...

strlen() loops over all characters. One loop is better than two. Not
that this is likely to cause any noticeable slowdown, unnecessary
strlen() calls just tend to catch my attention...

> > 
> > > +/* Check if a path string starts with a / and only contains valid characters */
> > > +static bool object_path_is_valid(const char *test_string) {
> > >       uint32_t i;
> > >   
> > > +    if (!test_string)
> > > +        return false;
> > > +
> > > +    /* Make sure the string starts with / and does not end with a / */
> > > +    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
> > > +        return false;
> > > +
> > >       for (i = 0; test_string[i]; i++) {
> > > -        if (test_string[i] == '{' ||
> > > -            test_string[i] == '}')
> > > -            return false;
> > > +
> > > +        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
> > > +            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
> > > +            (test_string[i] >= '0' && test_string[i] <= '9') ||
> > > +            test_string[i] == '.' ||
> > > +            test_string[i] == '_' ||
> > > +            test_string[i] == '-' ||
> > > +            test_string[i] == '/')
> > > +            continue;
> > > +
> > > +        return false;
> > >       }
> > 
> > You could save the strlen() call if you checked the trailing slash
> > after this for loop when you're at the end of the string.
> > 	
> > By the way, do you perhaps want to disallow two subsequent slashes in
> > object paths? (I don't care myself, because I doubt that invalid paths
> > will ever be encountered anyway when registering message handlers.)
> 
> If I don't disallow it, then a double slash in the path will be valid.
> At least up to now we do not care about the individual components
> that the path is constructed from.
> So should I disallow it or not?

As I said, I don't care. Clients don't register message handlers, so
we're dealing only with server code, and I find it unlikely that code
that generates weird paths would get past review.

If I have to have an opinion, then this is it: if we're anyway
validating the paths, disallowing double slashes would be logical,
becacuse I can't imagine such paths ever getting created on purpose.

-- 
Tanu

https://www.patreon.com/tanuk
https://liberapay.com/tanuk