alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Alejandro Colomar
une at alejandro-colomar.es
Tue Mar 18 21:16:05 UTC 2025
Hi,
Here's v2 after Joseph's feedback.
The C Committee mailing list is a mess. Please include the following
header in your response if you're reading this email from the C Committe
mailing list:
In-Reply-To: <ovyhifkfxvrulde33vara5qb3zerletmxrtfiur4z3c2xnlksz at k4m7xt5kd62l>
Cheers,
Alex
---
Name
alx-0008r1 - Standardize strtoi(3) and strtou(3) from NetBSD
Principles
- Codify existing practice to address evident deficiencies.
- Enable secure programming
Category
Standardize existing libc APIs
Author
Alejandro Colomar <alx at kernel.org>
Cc: <liba2i at lists.linux.dev>
Cc: <libbsd at lists.freedesktop.org>
Cc: <sc22wg14 at open-std.org>
Cc: <tech-misc at netbsd.org>
Cc: Bruno Haible <bruno at clisp.org>
Cc: christos <christos at netbsd.org>
Cc: Đoàn Trần Công Danh <congdanhqx at gmail.com>
Cc: Paul Eggert <eggert at cs.ucla.edu>
Cc: Eli Schwartz <eschwartz93 at gmail.com>
Cc: Guillem Jover <guillem at hadrons.org>
Cc: Iker Pedrosa <ipedrosa at redhat.com>
Cc: Joseph Myers <josmyers at redhat.com>
Cc: Michael Vetter <jubalh at iodoru.org>
Cc: Robert Elz <kre at netbsd.org>
Cc: <riastradh at NetBSD.org>
Cc: Sam James <sam at gentoo.org>
Cc: "Serge E. Hallyn" <serge at hallyn.com>
History
<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0008.git/>
r0 (2025-03-18):
- Initial draft.
r1 (2025-03-18):
- Add 'Future directions' section.
- Fix typos.
- Move to <inttypes.h> (7.8 instead of 7.24).
- Add links to more NetBSD bug reports in 'See also'.
- Add link to n3183 (discussed in Strasbourg) in 'See also'.
- Specify the possible implementation-defined behaviors when
the base is a value not specified here.
- Specify that the range coercion is done with saturation.
- Specify that if min>max, these functions return an
unspecified value.
- Add ECANCELED, EINVAL, ENOTSUP to <errno.h> (7.5).
- Note that in the future we'll want to make this
const-generic.
- Add example.
- Add implementation.
Description
The strtol(3) family of functions is do damn hard to use
correctly. Only a handful of programmers in the world really
know how to use it correctly in all the corner cases, and even
those need to be really careful to not make mistakes.
Several projects have tried to develop successor APIs, from
which the only one that is generic enough to supersede them is
strtoi/u(3) from NetBSD.
Other APIs include OpenBSD's strtonum(3), but that API isn't
generic, and cannot replace every use of strtol(3). gnulib has
also some attempts to improve their situation, but they're also
not suitable for standardization.
strtoi/u(3) had originally a bug, which shows how difficult it
is to correctly wrap strto{i,u}max(3) (from the strtol(3)
family). That bug has been fixed, and after two years of
research into string-to-numeric APIs, I can conclude that it is
a net improvement over the existing APIs, and doesn't have any
significant flaws.
It is still not the ideal API in terms of type safety, and I'm
working on a library that provides safer wrappers. However,
such a library would still benefit from having strtoi/u(3) in
the standard library, by being able to wrap around it. And user
programs would immediately benefit from being able to replace
strtol(3) et al. by strtoi/u(3).
I have audited several projects which use strtol(3) et al., and
they're full of bugs. It's an API that we should really
deprecate some day.
Prior art
NetBSD provides strto{i,u}(3), which were introduced in
NetBSD 7.
libbsd ports these APIs to other POSIX systems.
shadow-utils has its own implementation for internal use.
Here's a possible implementation of strtoi(3):
intmax_t
strtoi(const char *s, char **restrict endp, int base,
intmax_t min, intmax_t max, int *restrict status)
{
int e, st;
char *end;
intmax_t n;
if (endp == NULL)
endp = &end;
if (status == NULL)
status = &st;
if (base != 0 && (base < 2 || base > 36)) {
*endp = (char *) s;
*status = EINVAL;
return MAX(min, MIN(max, 0));
}
e = errno;
errno = 0;
n = strtoimax(s, endp, base);
if (*endp == s)
*status = ECANCELED;
else if (errno == ERANGE || n < min || n > max)
*status = ERANGE;
else if (**endp != '\0')
*status = ENOTSUP;
else
*status = 0;
errno = e;
return MAX(min, MIN(max, n));
}
strtou(3) can be implemented with the same exact code, replacing
s/intmax_t/uintmax_t/, and s/strtoimax/strtoumax/.
Future directions
atoi(3), scanf(3)
The atoi(3) family of functions has unnecessary UB. It could be
removed by redefining it in terms of this API:
int
atoi(const char *s)
{
int n, e;
n = strtoi(s, NULL, 10, _Minof(n), _Maxof(n), &e)
errno = e ?: errno;
return n;
}
Which would make atoi(3) behave just like one would expect.
Then we could define scanf(3)'s %d et al. in terms of atoi(3).
wchar_t
It could be interesting to add a wchar-based variant of these
APIs.
locale_t
It could be interesting to add a variant of these APIs that
accepts a locale_t parameter instead of using the current
locale. Those APIs exist in NetBSD as strtoi_l(), strtou_l().
_Generic
Once something like Chris's n3510 (2025-02-27, "Enhanced type
variance (v2)") is accepted into C2y, we could transform these
functions to use QChar, thus transforming them into
const-generic functions, as as with the strtol(3) family of
functions.
See also
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57828>
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58453>
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58461>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3183.pdf>
Proposed wording
Based on N3467.
7.5 Errors <errno.h>
@@ p2
The macros are
+ ECANCELED
EDOM
+ EINVAL
EILSEQ
+ ENOTSUP
ERANGE
7.8.3 Functions for greatest-width integer types
New section _before_ 7.8.3.3 (The strtoimax and strtoumax functions).
While all this section is new, some text is pasted verbatim from
7.24.2.8. I'll write that text as if it was already existing
in the diff below.
I also renamed the parameters of strtol(3):
nptr => s Because it's a string, not a pointer to a number.
endptr => endp It's shorter and just as readable (if not more).
@@
+7.8.2.* The <b>strtoi</b> and <b>strtou</b> functions
+
+Synopsis
+1 #include <inttypes.h>
+ intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
+ intmax_t min, intmax_t max, int *rstatus);
+ uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
+ uintmax_t min, uintmax_t max, int *rstatus);
+
+Description
+2 The <b>strtoi</b> and <b>strtou</b> functions
convert the initial portion of
the string pointed to by <tt>s</tt>
+ to <b>intmax_t</b> and <b>uintmax_t</b>,
respectively.
First,
they decompose the input string into three parts:
an initial, possibly empty, sequence of white-space characters,
a subject sequence resembling an integer
represented in some radix determined by the value of <tt>base</tt>,
and a final string of one or more unrecognized characters,
including the terminating null character of the input string.
+ Then,
they attempt to convert the subject sequence to an integer.
+ Then,
+ they coerce with saturation
+ the integer into the range [min, max].
+ Finally,
they return the result.
Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
type names as appropriate.
@@
+7 If the value of <tt>base</tt> is different from
+ the values specified in the preceding paragraphs,
+ it is implementation-defined
+ whether these functions successfully convert the value
+ and in which manner.
The above paragraph ensures that this function has no
input-controlled UB. strtol(s, NULL, base) with a
user-controlled base can result in UB, and thus vulnerabilities.
It is trivial to report an error, so let's do it. This function
is heavy enough that optimizing this is not worth. Even POSIX
does this for strtol(3).
@@
8 If the subject sequence is empty
or does not have the expected form,
+ or the value of <tt>base</tt> is not supported,
no conversion is performed;
the value of <tt>s</tt>
is stored in the object pointer to by <tt>endp</tt>,
provided that <tt>endp</tt> is not a null pointer.
The above paragraph ensures that *endp can be read after a call
to these functions. strtol(3) doesn't provide enough guarantees
to be able to reliably read it, even in POSIX, and it's hard to
portably write code that calls it and can inspect *endp after
the call without UB.
@@
Returns
+10 The <b>strtoi</b> and <b>strtou</b> functions
return the converted value, if any.
If no conversion could be performed,
+ zero is coerced with saturation into the range,
+ and then returned.
The paragraph above doesn't mention the range of representable
values (unlike 7.24.2.8) because that's already covered by the
range coercion specified in p2 above.
@@
+11 If <tt>min > max</tt>,
+ these functions return an unspecified value.
The above paragraph covers the case where min>max, where the
conversion with saturation into the range cannot do anything
meaningful. The error is still specified as ERANGE.
@@
+Errors
+12 These functions do not set <b>errno</b>.
+ Instead, they set the object pointed to by <tt>rstatus</tt>
+ to an error code,
+ or to zero on success.
+
+13 -- EINVAL The value in <tt>base</tt> is not supported.
+ -- ECANCELED The given string did not contain
+ any characters that were converted.
+ -- ERANGE The converted value was out of range
+ and has been coerced,
+ or the range was invalid (e.g., min > max).
+ -- ENOTSUP The given string contained characters
+ that did not get converted.
+
+14 If various errors happen in the same call,
+ the first one listed here is reported.
The paragraph above is important to differentiate the following:
strtoi("7z", &end, 0, 3, 7, &status);
strtoi("42z", &end, 0, 3, 7, &status);
@@
+15 EXAMPLE 1
+ The following is an example of
+ using these functions to parse a number
+ and the string that follows.
+
+ int err;
+ char *end;
+ intmax_t n, min = 5, max = 50;
+
+ n = strtoi(" 42 kg", &end, 10, min, max, &err);
+ if (err != 0) {
+ if (err == EINVAL || err == ECANCELED)
+ fprintf(stderr, "%s\n", strerror(err));
+ exit(EXIT_FAILURE);
+ if (err == ERANGE && n == min)
+ puts("Too light");
+ if (err == ERANGE && n == max)
+ puts("Too heavy");
+ }
+ printf("Quantity: %jd\n", n);
+ if (err == ENOTSUP)
+ printf("Units: %s\n", end + strspn(end));
+ else
+ puts("Unitless?");
--
<https://www.alejandro-colomar.es/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libbsd/attachments/20250318/669d9e29/attachment.sig>
More information about the libbsd
mailing list