A few questions about escaping in desktop files

Mon Aug 22 19:40:58 UTC 2022

Thanks for your quick response!

On 8/22/22 17:49, Simon McVittie wrote:
> I didn't write this specification, but I believe the intention is that you
> can either do the word-splitting and unquoting from first principles, or
> construct a string to pass to a shell and let the shell do the unquoting.
> Reserving characters like $ is a way to be nice to implementations that
> want to delegate the word-splitting and unquoting to a shell.

Let the shell do the unquoting?! To quote (pun intended) the standard:

 > Implementations must undo quoting before expanding field codes and 
before passing the argument to the executable program.

Isn't this forbidden? I guess the universal as-if rule mandates that if 
it won't change the behavior than it's possible, but this is strange.

When I first read this, I completely ruled out using a shell with Exec, 
but now I see that the "before passing the argument to the executable 
program" part doesn't strictly have to mean that a shell can't do its 
things before it would be passed to the program.

But then there are field codes. As I said, if it behaves correctly than 
it can probably do this in different order but I think this conflicts 
with the two implementations you have described.

> GLib does almost the same as you are doing, but reversing your second
> and third phases: first it decodes the .desktop file (using GKeyFile
> and g_key_file_get_string() to expand the escape sequences \s, \n, \t,
> \r, \\), then it replaces field codes like %f with a shell-style-quoted
> version of their expansion (so for example %f expands to the result of
> g_shell_quote(filename), similar to Python shlex.quote(filename)), then it
> does the equivalent of shell word-splitting and unquoting to get an array
> of arguments (g_shell_parse_argv(), similar to Python shlex.split()), and
> finally it passes that array to an argv-based API similar to execve()
> or posix_spawn(). This is considered to be a valid implementation.

A little disclaimer: I am working on a program that does this from the 
ground up. It's in C++ and it doesn't use toolkits nor GLib. I am not 
very familiar with them nor with python so my questions might be trivial.

Why is it quoting the evaluated field code? The filenames and URLs are 
pretty unambiguous and could be copied verbatim into the argument 
without fear of something misinterpreting it. If I understand it 
correctly, it's going to be unquoted by g_shell_parse_argv() you 
mentioned. Is this done just to make the parsing with 
g_shell_parse_argv() simpler?

> Or, if you have to use your own implementation, then if in doubt, I would
> suggest checking what GNOME and KDE would do with a particular .desktop
> file. If they both parse it without error and get the same result, then
> that is probably the result to be aiming for. Other GLib-based desktops
> will do the same as GNOME, and other Qt-based desktops are likely to do
> the same as KDE, so looking at GNOME and KDE will cover most users of
> .desktop files.

This is probably the best way to test this. I'll have to somehow 
configure Gnome and KDE on my system or find their specific program that 
does this but this will work.

> I believe the intention is that separating arguments with anything
> other than a single 0x20 has unspecified behaviour: applications should
> not install a .desktop file that does this, and if they do, different
> implementations are not guaranteed to parse it the same way.

Ok.

> However, if an implementation wants to be able to support something like
> 
>      Exec=my-utility --input-from-file=%f
> 
> then it is forced to have a parsing model where that works; and if its
> parsing model makes that work, then that probably implies that
> prefix%isuffix must be expanded to two arguments, "prefix--icon" and
> "my-icon-namesuffix". Certainly it looks like this is what GLib would do.

I didn't think of --arg=$f. This makes sense. Field codes would be nicer 
to parse if each field code would have its own argument but this would 
break your example.

> Honestly, put your "difficult" strings in a script somewhere else and
> then put the path to the script in the .desktop file - that'll be a lot
> more reliable and also easier to read.
I'm trying to come up with the most difficult, weird and ambiguous 
strings so that my implementation wouldn't get a segfault on them. If I 
would actually want to execute a program with such peculiar arguments 
than I would try to come up with something more reliable.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 659 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/xdg/attachments/20220822/796e4765/attachment.sig>