[SC22WG14.29900] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Joseph Myers
josmyers at redhat.com
Tue Mar 18 17:20:19 UTC 2025
On Tue, 18 Mar 2025, Alejandro Colomar wrote:
> 7.24.2 Numeric conversion functions
> New section _before_ 7.24.2.2 (The atof function).
You're missing corresponding <wchar.h> functions.
Maybe there should also be a reference to N3183 (discussed in Strasbourg)
- which dealt with UB for numeric conversions in scanf rather than strto*,
but still seems related to this proposal.
> While all this section is new, some text is pasted verbatim from
> 7.24.2.8. I'll write that text as if it was already existing
> in the diff below.
>
> I also renamed the parameters of strtol(3):
> nptr => s Because it's a string, not a pointer to a number.
> endptr => endp It's shorter and just as readable (if not more).
>
> @@
> +7.24.2.* The <b>strtoi</b> and <b>strtou</b> functions
> +
> +Synopsis
> +1 #include <stdlib.h>
> + intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> + intmax_t min, intmax_t max, int *rstatus);
> + uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> + uintmax_t min, uintmax_t max, int *rstatus);
intmax_t and uintmax_t are not declared in <stdlib.h>. Either the
synopsis should mention <stdint.h> as well, or those types should be added
to the ones declared by that header.
I'm also concerned that the names sound like int / unsigned int analogues
of strtol, but aren't.
> +Description
> +2 The <b>strtoi</b> and <b>strtou</b> functions
> convert the initial portion of
> the string pointed to by <tt>s</tt>
> + to <b>intmax_t</b> and <b>uintmax_t</b>,
> respectively.
> First,
> they decompose the input string into three parts:
> an initial, possibly empty, sequence of white-space characters,
> a subject sequence resembling an integer
> represented in some radix determined by the value of <tt>base</tt>,
> and a final string of one or more unrecognized characters,
> including the terminating null character of the input string.
> + Then,
> they attempt to convert the subject sequence to an integer.
> + Then,
> + they coerce the integer into the range [min, max].
> + Finally,
> they return the result.
>
> Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
> type names as appropriate.
So the conversion is still locale-specific (p6). One thing that can be
useful for numeric conversions, and isn't covered well by the standard at
present, is ones that are guaranteed to be in the C locale. (That would
require a flags argument or similar to configure the functions.)
> @@
> +7 If the value of <tt>base</tt> is different from
> + the values specified in the preceding paragraphs,
> + the behavior is implementation defined.
It's "implementation-defined", with a hyphen. And for that to be useful,
you need clear bounds on what is permitted (that is, an
implementation-defined set of sequences is accepted, and interpreted as
having implementation-defined numeric values).
> @@
> Returns
> +10 The <b>strtoi</b> and <b>strtou</b> functions
> return the converted and coerced value, if any.
> If no conversion could be performed,
> + zero is coerced into the range,
> + and then returned.
>
> The paragraph above doesn't mention the range of representable
> values (unlike 7.24.2.8) because that's already covered by the
> range coercion specified in p2 above.
You don't seem to define how the coercion works. Modulo? Saturation?
Something else? ("Coerce" is not a term defined in the C standard, nor in
ISO 2382. So it has no semantics without them being explicitly defined
for these functions.)
What happens if min > max? You say below that there is an ERANGE error
for this case, but don't say what the return value is when it can't be in
the range.
> +Returns
> +10 The <b>strtoi</b> and <b>strtou</b> functions
> + return the converted value, if any.
> + If no conversion is returned,
> + these functions return the value in the range [min, max]
> + that is closer to 0.
What if both are equally close to 0?
> +Errors
> +11 These functions don't set <b>errno</b>.
The standard does not use the abbreviation "don't", but says "do not".
> + Instead, they set the object pointed to by <tt>rstatus</tt>
> + to an error code,
> + or to zero on success.
> +
> +12 -- EINVAL The value in <tt>base</tt> is not supported.
> + -- ECANCELED The given string did not contain
> + any characters that were converted.
> + -- ERANGE The converted value was out of range
> + and has been coerced,
> + or the range was invalid (e.g., min > max).
> + -- ENOTSUP The given string contained characters
> + that did not get converted.
Of these names, only ERANGE is actually defined in the C standard. You
don't have any updates to <errno.h> to add the others.
These functions would clearly also need several examples added to the
standard to illustrate their functionality, which are missing from this
proposal.
--
Joseph S. Myers
josmyers at redhat.com
More information about the libbsd
mailing list