[SC22WG14.29900] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD

Joseph Myers josmyers at redhat.com
Tue Mar 18 17:20:19 UTC 2025


On Tue, 18 Mar 2025, Alejandro Colomar wrote:

>     7.24.2  Numeric conversion functions
> 	New section _before_ 7.24.2.2 (The atof function).

You're missing corresponding <wchar.h> functions.

Maybe there should also be a reference to N3183 (discussed in Strasbourg) 
- which dealt with UB for numeric conversions in scanf rather than strto*, 
but still seems related to this proposal.

> 	While all this section is new, some text is pasted verbatim from
> 	7.24.2.8.  I'll write that text as if it was already existing
> 	in the diff below.
> 
> 	I also renamed the parameters of strtol(3):
> 	nptr => s	Because it's a string, not a pointer to a number.
> 	endptr => endp	It's shorter and just as readable (if not more).
> 
> 	@@
> 	+7.24.2.*  The <b>strtoi</b> and <b>strtou</b> functions
> 	+
> 	+Synopsis
> 	+1	#include <stdlib.h>
> 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> 	+	    intmax_t min, intmax_t max, int *rstatus);
> 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> 	+	    uintmax_t min, uintmax_t max, int *rstatus);

intmax_t and uintmax_t are not declared in <stdlib.h>.  Either the 
synopsis should mention <stdint.h> as well, or those types should be added 
to the ones declared by that header.

I'm also concerned that the names sound like int / unsigned int analogues 
of strtol, but aren't.

> 	+Description
> 	+2	The <b>strtoi</b> and <b>strtou</b> functions
> 		convert the initial portion of
> 		the string pointed to by <tt>s</tt>
> 	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
> 		respectively.
> 		First,
> 		they decompose the input string into three parts:
> 		an initial, possibly empty, sequence of white-space characters,
> 		a subject sequence resembling an integer
> 		represented in some radix determined by the value of <tt>base</tt>,
> 		and a final string of one or more unrecognized characters,
> 		including the terminating null character of the input string.
> 	+	Then,
> 		they attempt to convert the subject sequence to an integer.
> 	+	Then,
> 	+	they coerce the integer into the range [min, max].
> 	+	Finally,
> 		they return the result.
> 
> 	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
> 	type names as appropriate.

So the conversion is still locale-specific (p6).  One thing that can be 
useful for numeric conversions, and isn't covered well by the standard at 
present, is ones that are guaranteed to be in the C locale.  (That would 
require a flags argument or similar to configure the functions.)

> 	@@
> 	+7	If the value of <tt>base</tt> is different from
> 	+	the values specified in the preceding paragraphs,
> 	+	the behavior is implementation defined.

It's "implementation-defined", with a hyphen.  And for that to be useful, 
you need clear bounds on what is permitted (that is, an 
implementation-defined set of sequences is accepted, and interpreted as 
having implementation-defined numeric values).

> 	@@
> 	 Returns
> 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> 		return the converted and coerced value, if any.
> 		If no conversion could be performed,
> 	+	zero is coerced into the range,
> 	+	and then returned.
> 
> 	The paragraph above doesn't mention the range of representable
> 	values (unlike 7.24.2.8) because that's already covered by the
> 	range coercion specified in p2 above.

You don't seem to define how the coercion works.  Modulo?  Saturation?  
Something else?  ("Coerce" is not a term defined in the C standard, nor in 
ISO 2382.  So it has no semantics without them being explicitly defined 
for these functions.)

What happens if min > max?  You say below that there is an ERANGE error 
for this case, but don't say what the return value is when it can't be in 
the range.

> 	+Returns
> 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> 	+	return the converted value, if any.
> 	+	If no conversion is returned,
> 	+	these functions return the value in the range [min, max]
> 	+	that is closer to 0.

What if both are equally close to 0?

> 	+Errors
> 	+11	These functions don't set <b>errno</b>.

The standard does not use the abbreviation "don't", but says "do not".

> 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> 	+	to an error code,
> 	+	or to zero on success.
> 	+
> 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> 	+	-- ECANCELED	The given string did not contain
> 	+			any characters that were converted.
> 	+	-- ERANGE	The converted value was out of range
> 	+			and has been coerced,
> 	+			or the range was invalid (e.g., min > max).
> 	+	-- ENOTSUP	The given string contained characters
> 	+			that did not get converted.

Of these names, only ERANGE is actually defined in the C standard.  You 
don't have any updates to <errno.h> to add the others.

These functions would clearly also need several examples added to the 
standard to illustrate their functionality, which are missing from this 
proposal.

-- 
Joseph S. Myers
josmyers at redhat.com



More information about the libbsd mailing list