alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD

Bruno Haible bruno at clisp.org
Tue Mar 18 21:53:09 UTC 2025


Hi Alejandro,

> Below is a draft of a proposal for standardization of strtoi/u(3) from
> NetBSD in ISO C2y.

First of all: I like your initiative, and I moderately like this proposal.

> 	The strtol(3) family of functions is do damn hard to use
> 	correctly.  Only a handful of programmers in the world really
> 	know how to use it correctly in all the corner cases, and even
> 	those need to be really careful to not make mistakes.

It would be useful to list the mistakes that are being made most frequently;
so as to verify that the proposed strtoi / strtou functions don't tend
to provoke the same mistakes. (I'd guess that one of the frequent mistakes
is that when the number is not expected to occupy the entire string,
the success test after (errno = 0, strtol (...)) is
    endptr > nptr && errno == 0
and programmers tend to forget one of the two conditions.)

> 	+Synopsis
> 	+1	#include <stdlib.h>
> 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> 	+	    intmax_t min, intmax_t max, int *rstatus);
> 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> 	+	    uintmax_t min, uintmax_t max, int *rstatus);

Probably it will be an impediment to adoption that these functions work
on [u]intmax_t, which is 64-bits or 128-bits integers, which seems overkill
when people want to parse, say, a port number in the range 0..65535.

To address this adoption problem, how about changing these function to
generic functions (in the sense of <tgmath.h>)? In such a way that
    strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
is known to return a 'long' rather than 'intmax_t', and
    strtoi (n, &end, base, INT_MIN, INT_MAX, &status)
is known to return an 'int' rather than 'intmax_t'.

If the standard does NOT say that these functions are generic, it would
be harder for an implementation to optimize invocations of these
functions for narrower types: I don't see how it could be done without
explicit compiler support.

> 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> 	+	to an error code,
> 	+	or to zero on success.
> 	+
> 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> 	+	-- ECANCELED	The given string did not contain
> 	+			any characters that were converted.
> 	+	-- ERANGE	The converted value was out of range
> 	+			and has been coerced,
> 	+			or the range was invalid (e.g., min > max).
> 	+	-- ENOTSUP	The given string contained characters
> 	+			that did not get converted.
> 	+
> 	+13	If various errors happen in the same call,
> 	+	the first one listed here is reported.

It would be useful to show how a success test looks like, after
    strtoi (s, &end, base, min, max, &status)
for each of the four frequent use-cases:
  -a. expect to parse the initial portion of the string, no coercion,
  -b. expect to parse the initial portion of the string, silent coercion,
  -c. expect to parse the entire string, no coercion,
  -d. expect to parse the entire string, silent coercion.

AFAICS, the success tests are:
  -a. status == 0 || status == ENOTSUP
  -b. status == 0 || status == ENOTSUP || status == ERANGE
  -c. status == 0
  -d. status == 0 || (status == ERANGE && end > s && *end == '\0')

The success test in case d. is so complicated that, for my feeling, the goal
to avoid programmer mistakes is not being met.

I would therefore propose to change the status value to a bit mask, so that
the error conditions "The converted value was out of range and has been
coerced" and "The given string contains characters that did not get converted"
can be both returned together, without conflicting.

And, while at it, the error condition "min > max" is an error that is
independent of the given string contents; I would better see it mapped to
EINVAL rather than ERANGE.

Bruno





More information about the libbsd mailing list