alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Alejandro Colomar
alx at kernel.org
Wed Mar 19 18:48:59 UTC 2025
On Wed, Mar 19, 2025 at 04:26:11PM +0100, Alejandro Colomar wrote:
> Hi Bruno,
>
> On Wed, Mar 19, 2025 at 01:15:30AM +0100, Bruno Haible wrote:
> > Alejandro Colomar wrote:
> > > > It would be useful to show how a success test looks like, after
> > > > strtoi (s, &end, base, min, max, &status)
> > > > for each of the four frequent use-cases:
> > > > -a. expect to parse the initial portion of the string, no coercion,
> > > > -b. expect to parse the initial portion of the string, silent coercion,
> > > > -c. expect to parse the entire string, no coercion,
> > > > -d. expect to parse the entire string, silent coercion.
> > > >
> > > > AFAICS, the success tests are:
> > > > -a. status == 0 || status == ENOTSUP
> > >
> > > Correct.
> > >
> > > > -b. status == 0 || status == ENOTSUP || status == ERANGE
> > >
> > > Correct (but most likely a bug).
>
> Actually, now I remember that status can be NULL, in which case it's not
> reported. This is a case where you could check for errors with a
> simpler expression:
>
> end != str
>
> but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> reasnoable one.
>
> like you can do with strtol(3), but with portability guarantess
> regarding EINVAL, because strtoi(3bsd) always writes *endp (if nonnull).
>
> I need to update the specification to mention that status can be NULL.
>
> > >
> > > > -c. status == 0
> > >
> > > Correct.
> > >
> > > > -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > >
> > > You don't need end>s, because that would preclude ERANGE.
> > >
> > > status == 0 || (status == ERANGE && end == '\0')
> > >
> > > Aaand, most likely a bug.
> >
> > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > that the values "4294967297" and "4294967295" produce the same behaviour
> > (the same error message, for example).
>
> If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> result in an error, then you should probably clamp at ULONG_MAX - 1,
> and consider anything above an error.
>
> > It is for these cases that your specification contains the clamping /
> > coercion behaviour.
> >
> > Now, when you look at the table of success tests:
> >
> > -a. status == 0 || status == ENOTSUP
> > -b. status == 0 || status == ENOTSUP || status == ERANGE
> > -c. status == 0
> > -d. status == 0 || (status == ERANGE && *end == '\0')
> >
> > it is immediately clear that the status return convention is ill-designed,
> > because the returned 'status' is not the only thing a programmer has to test
> > after calling the function.
> >
> > > Cases b and d are not real, IMO. I have never seen code where that is
> > > wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
> > > looking precisely for that usage.
> >
> > I disagree.
>
> I didn't find any occurence of 'd' in calls to strtoi(3)/strtou(3).
> I didn't analyze calls to strtol(3) et al.
>
> > Any use of strtoul that does not test errno wants overflow
> > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > Just looking in gnulib and gettext, I find already 6 occurrences:
> > gnulib/lib/getaddrinfo.c:299
>
> lib/getaddrinfo.c-297- if (!(*servname >= '0' && *servname <= '9'))
> lib/getaddrinfo.c-298- return EAI_NONAME;
> lib/getaddrinfo.c:299: port = strtoul (servname, &c, 10);
> lib/getaddrinfo.c-300- if (*c || port > 0xffff)
> lib/getaddrinfo.c-301- return EAI_NONAME;
> lib/getaddrinfo.c-302- port = htons (port);
>
> You could remove the preceding conditional if you don't want to avoid
> leading whitespace. You could merge that into the strtou(3) call, which
> would report ECANCELED for non-numeric input). Except that a negative
> number is silently converted to a positive large value. This is why I
> use a wrapper function strtou_noneg() that rejects negative numbers.
>
> You could rewrite it as:
>
> port = strtou_noneg(servname, NULL, 10, 0, UINT16_MAX, &status);
> if (status != 0)
> return EAI_NONAME;
> port = htons(port);
>
> where strtou_noneg() is:
>
> uintmax_t
> strtou_noneg(const char *s, char **restrict endp, int base,
> uintmax_t min, uintmax_t max, int *restrict status)
> {
> int st;
>
> if (status == NULL)
> status = &st;
> if (strtoi(s, endp, base, 0, 1, status) == 0 && *status == ERANGE)
> return min;
>
> return strtou(s, endp, base, min, max, status);
> }
>
> I think this is not one case where you want silent saturation. You're
> indeed doing range checks [0, UINT16_MAX].
>
> > gnulib/lib/nproc.c:402
>
> lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> lib/nproc.c-384- Return 0 for invalid values. */
> lib/nproc.c-385-static unsigned long int
> lib/nproc.c:386:parse_omp_threads (char const* threads)
> lib/nproc.c-387-{
>
> ...
>
> lib/nproc.c-398- /* Convert it from positive decimal to 'unsigned long'. */
> lib/nproc.c-399- if (c_isdigit (*threads))
> lib/nproc.c-400- {
> lib/nproc.c-401- char *endptr = NULL;
> lib/nproc.c:402: unsigned long int value = strtoul (threads, &endptr, 10);
> lib/nproc.c-403-
> lib/nproc.c-404- if (endptr != NULL)
> lib/nproc.c-405- {
> lib/nproc.c-406- while (*endptr != '\0' && c_isspace (*endptr))
> lib/nproc.c-407- endptr++;
> lib/nproc.c-408- if (*endptr == '\0')
> lib/nproc.c-409- return value;
> lib/nproc.c-410- /* Also accept the first value in a nesting level,
> lib/nproc.c-411- since we can't determine the nesting level from env vars. */
> lib/nproc.c-412- else if (*endptr == ',')
> lib/nproc.c-413- return value;
> lib/nproc.c-414- }
> lib/nproc.c-415- }
>
> First of all, the endptr!=NULL test seems misplaced. The only way that
> could be true is if the base is unsupported, and 10 is necessarily
s/true/equal/
> supported. You should remove the initialization '= NULL', and the
> check, since both are dead code, IIRC. That's one of the things you
> don't need to care with strtoi(3), because it _always_ sets *endp.
>
> And you could probably remove the isdigit test by calling
> strtou_noneg().
>
> This could be something like this (fixing the bugs reported above):
>
> char *end;
> u_long value;
>
> value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> if (end != threads) {
> end += strspn(end, " \t\n");
> if (streq(end, "")
> return value;
> if (strprefix(end, ","))
> return value;
> }
>
>
> This is one case where you seem to silently ignore saturation. Why
> don't you have any diagnostic message?
>
> > gnulib/lib/omp-init.c:48
>
> lib/omp-init.c-47- char *endptr = NULL;
> lib/omp-init.c:48: unsigned long int value = strtoul (threads, &endptr, 10);
> lib/omp-init.c-49-
> lib/omp-init.c-50- if (endptr != NULL)
> lib/omp-init.c-51- {
> lib/omp-init.c-52- while (*endptr != '\0' && c_isspace (*endptr))
> lib/omp-init.c-53- endptr++;
> lib/omp-init.c-54- if (*endptr == '\0')
> lib/omp-init.c-55- return value;
> lib/omp-init.c-56- /* Also accept the first value in a nesting level,
> lib/omp-init.c-57- since we can't determine the nesting level from env vars. */
> lib/omp-init.c-58- else if (*endptr == ',')
> lib/omp-init.c-59- return value;
> lib/omp-init.c-60- }
>
> This seems identical to the previous case.
>
> > gettext/gettext-tools/src/msgfmt.c:287
>
> gettext-tools/src/msgfmt.c-286- char *endp;
> gettext-tools/src/msgfmt.c:287: size_t new_align = strtoul (optarg, &endp, 0);
> gettext-tools/src/msgfmt.c-288-
> gettext-tools/src/msgfmt.c-289- if (endp != optarg)
> gettext-tools/src/msgfmt.c-290- alignment = new_align;
>
> This code will misbehave badly on platforms where size_t is narrower
> than u_long. Consider the case where you parse a high u_long, let's say
> SIZE_MAX + 1ul. It will be converted to 1. There's a bug due to a
> missing range check (but orthogonal to saturation).
>
> You also don't reject negative numbers, which I expect to be a bug,
> connected with the one from above.
>
> This could be rewritten to (fixing the bugs reported above):
>
> char *end;
> size_t new_align;
>
> new_align = strtou_noneg(optarg, &end, 0, 0, SIZE_MAX, NULL);
> if (optarg != end)
> alignment = new_align;
>
> This seems another case where you silently saturate. Why don't you have
> a diagnostic message for invalid input?
>
> > gettext/gettext-tools/src/msgl-check.c:379
>
> gettext-tools/src/msgl-check.c-374- while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> gettext-tools/src/msgl-check.c-375- ++nplurals;
> gettext-tools/src/msgl-check.c-376- endp = nplurals;
> gettext-tools/src/msgl-check.c-377- nplurals_value = 0;
> gettext-tools/src/msgl-check.c-378- if (*nplurals >= '0' && *nplurals <= '9')
> gettext-tools/src/msgl-check.c:379: nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> gettext-tools/src/msgl-check.c-380- if (nplurals == endp)
> gettext-tools/src/msgl-check.c-381- {
> gettext-tools/src/msgl-check.c-382- const char *msg = _("invalid nplurals value");
> gettext-tools/src/msgl-check.c-383- char *help = plural_help (nullentry);
> gettext-tools/src/msgl-check.c-384-
> gettext-tools/src/msgl-check.c-385- if (help != NULL)
> gettext-tools/src/msgl-check.c-386- {
> gettext-tools/src/msgl-check.c-387- char *msgext = xasprintf ("%s\n%s", msg, help);
> gettext-tools/src/msgl-check.c-388- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> gettext-tools/src/msgl-check.c-389- msgext);
> gettext-tools/src/msgl-check.c-390- free (msgext);
> gettext-tools/src/msgl-check.c-391- free (help);
> gettext-tools/src/msgl-check.c-392- }
> gettext-tools/src/msgl-check.c-393- else
> gettext-tools/src/msgl-check.c-394- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> gettext-tools/src/msgl-check.c-395- msg);
> gettext-tools/src/msgl-check.c-396-
> gettext-tools/src/msgl-check.c-397- seen_errors++;
> gettext-tools/src/msgl-check.c-398- }
>
> You could get rid of a lot of code preceding the strtoul(3) call by
> calling strtou_noneg() instead. And strspn(3) would also help.
>
> nplurals += strspn(nplurals, " \t\n");
> nplurals_value = strtou_noneg(nplurals, (char **) &end, 10, 0, ULONG_MAX);
> if (nplurals == end)
> ...
>
> On the other hand, I also wonder why you don't diagnose invalid input.
> Why is -10 an "invalid nplurals value", but ULONG_MAX+10 is a valid
> (albeit clamped) one?
>
> All of these cases look like missing error handling, IMO.
>
> > gettext/gettext-tools/src/read-stringtable.c:561
>
> gettext-tools/src/read-stringtable.c-553- {
> gettext-tools/src/read-stringtable.c-554- char *last_colon;
> gettext-tools/src/read-stringtable.c-555- unsigned long number;
> gettext-tools/src/read-stringtable.c-556- char *endp;
> gettext-tools/src/read-stringtable.c-557-
> gettext-tools/src/read-stringtable.c-558- if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> gettext-tools/src/read-stringtable.c-559- && (last_colon = strrchr (line + 6, ':')) != NULL
> gettext-tools/src/read-stringtable.c-560- && *(last_colon + 1) != '\0'
> gettext-tools/src/read-stringtable.c:561: && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> gettext-tools/src/read-stringtable.c-562- {
> gettext-tools/src/read-stringtable.c-563- /* A "File: <filename>:<number>" type comment. */
> gettext-tools/src/read-stringtable.c-564- *last_colon = '\0';
> gettext-tools/src/read-stringtable.c-565- catalog_reader_seen_comment_filepos (catr, line + 6, number);
> gettext-tools/src/read-stringtable.c-566- }
> gettext-tools/src/read-stringtable.c-567- else
> gettext-tools/src/read-stringtable.c-568- catalog_reader_seen_comment (catr, line);
> gettext-tools/src/read-stringtable.c-569- }
>
> Let me try to rewrite it for readability first.
>
> char *filepos, *last_colon;
> u_long n;
> const char *numstr, *end;
>
> filepos = strprefix(line, "File: ") ?: (char []){""};
> last_colon = strrchr(filepos, ':') ?: (char []){":"};
> numstr = strprefix(last_colon, ":");
>
> n = strtoul(numstr, (char **) &end, 10);
> if (numstr != end && streq(end, "")) {
> /* A "File: <filename>:<number>" type comment. */
> strcpy(last_colon, "");
> catalog_reader_seen_comment_filepos(catr, filepos, n);
>
> } else {
> catalog_reader_seen_comment(catr, line);
> }
>
> You're forgetting about negative numbers? Or are you certain that they
> can't happen? How about huge values? Assuming you want compatible code
> calling strtou():
>
> n = strtou(numstr, (char **) &end, 10, 0, ULONG_MAX, NULL);
> if (status == 0 || status == ENOTSUP || status == ERANGE) {
> ...
> } else {
> ...
> }
>
> But again, I wonder why you don't do range checks.
>
> > > > I would therefore propose to change the status value to a bit mask, so that
> > > > the error conditions "The converted value was out of range and has been
> > > > coerced" and "The given string contains characters that did not get converted"
> > > > can be both returned together, without conflicting.
> > >
> > > Because it is theoretical conditions that a real program never wants,
> > > let's not do that.
> >
> > If you don't want to do that, I can only repeat what I said in the previous
> > mail: The proposal *does not achieve the goal* of avoiding the most common
> > programmer mistakes. For a robust API, the success test should *only* involve
> > testing the returned 'status', nothing else.
>
> Let's discuss this after your responses to the above.
>
> >
> > Bruno
> >
> >
> >
> >
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libbsd/attachments/20250319/01a35a2b/attachment.sig>
More information about the libbsd
mailing list