Questionable behavior of strtoi(3bsd) / strtou(3bsd)

Alejandro Colomar alx at kernel.org
Sun Jan 7 20:23:51 UTC 2024


Hi!

Guillem wrote:
> Hi!
> 
> On Sun, 2024-01-07 at 03:44:23 +0100, Alejandro Colomar wrote:
>> While implementing my own strtoi/u() for shadow, and testing it against
>> strtoi/u(3bsd), I found that the behavior differed slightly for one
>> specific case: when both ERANGE and ENOTSUP happen at the same time, my
>> implementation reports ERANGE, while strtoi/u(3bsd) reports ENOTSUP.
>> 
>> 	strtou("5z", NULL, 0, 0, 4, &status);
>> 
>> I don't know what's the behavior of the original NetBSD implementation,
>> as I couldn't find the file where it's implemented.  If you can give a
>> pointer to where it's implemented in NetBSD, it would help.  Maybe you
>> could also CC some NetBSD maintainers or list to get their opinion.
> 
> It should be the exact same behavior as on NetBSD, yes. The source
> file can be seen from the CVS Id in the source. On my local NetBSD
> tree the function is implemented via macros, can be found in:
> 
>   common/lib/libc/stdlib/strtoi.c
>   common/lib/libc/stdlib/strtou.c
>   common/lib/libc/stdlib/_strtoi.h

Thanks!

>> I think that design is wrong, and would
>> like to justify:
>> 
>> If a value is out-of-range, that's a more problematic error than just
>> having trailing text.  Trailing text is often expected, and is not
>> treated as an error, but out-of-range is often a hard error.  Silencing
>> that error is problematic, and there's no way to work around it.  If
>> instead strtoi(3bsd) reported ERANGE when both ERANGE + ENOTSUP happen,
>> it would be trivial for the caller to check it there's trailing text via
>> 'endptr'.
> 
> I didn't design the functions, so not sure whether that was an
> intentional choice or not, from the code PoV I can see the structure
> making sense (in first making sure the string parsed is correct, before
> evaluating what got parsed from it),

That's not correct.  strtol(3) and relatives have (ignoring EINVAL) 3
stages, not 2.  Firstly we make sure we parsed a number, secondly we
make sure the number is valid (in range), and thirdly we check if
there's any trailing text.  That's what strtol(3) does (but it reorders
steps 1 and 2):

         errno = 0;    /* To distinguish success/failure after call */
         val = strtol(str, &endptr, base);

         /* Check for various possible errors. */

         if (errno == ERANGE) {
             perror("strtol");
             exit(EXIT_FAILURE);
         }

         if (endptr == str) {
             fprintf(stderr, "No digits were found\n");
             exit(EXIT_FAILURE);
         }

         /* If we got here, strtol() successfully parsed a number. */

         printf("strtol() returned %ld\n", val);

         if (*endptr != '\0')        /* Not necessarily an error... */
             printf("Further characters after number: \"%s\"\n", endptr);

You can reorder the first 2 checks, because they are exclusive (ERANGE
and ECANCELED are impossible together).  The check for trailing text is
independent from ERANGE, and needs to be performed afterwards.

> and while I see where you are coming
> from, I wouldn't say the design is necessarily wrong, it's probably a
> convenience function that might not always be suitable as convenient? :)

The idea behind the function is great; it's strtol(3) but with simpler
error checking.  The only problem is this small bug.  If it didn't have
this bug, I'd say it's superior to strtol/imax(3) in every way.

> Because in the same way you can duplicate the endptr check, you could
> as well do the same for the range checks.

Nope.  strtoi(3) is discarding that information, which cannot be
recovered in any way (except parsing again with strtoimax(3)).  ENOTSUP
comes from strtoimax(3)'s endptr, so one can check `(*end != '\0')` if
status == ERANGE.  But the only way strtoimax(3) reports ERANGE is
precisely by setting errno (which strtoi(3bsd) clears, so you can't
check errno afterwards; that ERANGE is simply lost).  The following two
calls are indistinguishable, which is bad:

	assert(strtoi("42 z", &end, 0, 0, 7, &status) == 7);
	assert(status == ENOTSUP);
	assert(strcmp(end, " z") == 0);

	assert(strtoi("7 z", &end, 0, 0, 7, &status) == 7);
	assert(status == ENOTSUP);
	assert(strcmp(end, " z") == 0);

And this is not a corner case; it's a central case when parsing text.  A
better design would have been to behave this way:

	assert(strtoi("42 z", &end, 0, 0, 7, &status) == 7);
	assert(status == ERANGE);
	assert(strcmp(end, " z") == 0);

	assert(strtoi("7 z", &end, 0, 0, 7, &status) == 7);
	assert(status == ENOTSUP);
	assert(strcmp(end, " z") == 0);

> I'm also not sure whether
> changing the semantics now could cause other issues though.

I'm also concerned about it.  I'll check NetBSD's code in case I find
something that breaks there.  There's also the possibility that fixing
this design problem would silently fix some bugs.  I'll check if that's
the case, which will help get the fix applied.  :)

> 
> If the function changes in NetBSD, I'm happy to import those changes,
> but I don't think I'd be comfortable doing that unilaterally in libbsd.
> I'd suggest you file a bug report in the NetBSD bug tracker.

Sure; will do.  I'll send a few more emails to this thread with the
results of my search in NetBSD's code, and when I have all the info I
want, I'll file a bug there.

> Thanks,
> Guillem

Bon any,
Alex

-- 
<https://www.alejandro-colomar.es/>
Looking for a remote C programming job at the moment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libbsd/attachments/20240107/814777a5/attachment.sig>


More information about the libbsd mailing list