restrictness of strtoi(3bsd) and strtol(3)
Amol Surati
suratiamol at gmail.com
Sun Dec 3 10:59:07 UTC 2023
On Sat, 2 Dec 2023 at 18:05, Alejandro Colomar via Gcc-help
<gcc-help at gcc.gnu.org> wrote:
>
> On Sat, Dec 02, 2023 at 01:29:01PM +0100, Alejandro Colomar wrote:
> > On Sat, Dec 02, 2023 at 12:50:28PM +0100, Alejandro Colomar wrote:
> > > Hi,
> > >
> > > I've been implementing my own copy of strto[iu](3bsd), to avoid the
> > > complexity of calling strtol(3) et al. In the process, I've noticed
> > > that all of these functions use restrict for their parameters.
> > >
> > > Why do these functions use restrict? While the second parameter is not
> > > used for accessing nptr memory (**endptr is not accessed), it can point
> > > to the same memory. Here is an example of how these functions can have
> > > pointers to the same memory in the two arguments.
> > >
> > > l = strtol(p, &p, 0);
> > >
> > > The use of restrict in the prototype of the function could result in
> > > compiler warnings, no? Currently, I don't see any warnings, but I
> > > suspect the compiler could complain, since the same memory is available
> > > to the function via two different arguments (albeit with a different
> > > number of references).
> > >
> > > The use of restrict in the definition of the function doesn't help the
> > > optimizer, since it already knows that the second parameter is out-only,
> > > so even if it weren't restrict, the only way to access memory is via the
> > > first parameter.
> >
> > In the case of strto[iu](3bsd), I have even more doubts.
> >
> > Here's libbsd's version of it (omitting unimportant parts):
> >
> > $ grepc -tfd strtoi .
> > ./src/strtoi.c:intmax_t
> > strtoi(const char *__restrict nptr,
> > char **__restrict endptr, int base,
> > intmax_t lo, intmax_t hi, int *rstatus)
> > {
> > ...
> >
> > im = strtoimax(nptr, endptr, base);
> >
> > *rstatus = errno;
> > errno = serrno;
> >
> > if (*rstatus == 0) {
> > /* No digits were found */
> > if (nptr == *endptr)
> > *rstatus = ECANCELED;
> > /* There are further characters after number */
> > else if (**endptr != '\0')
> > *rstatus = ENOTSUP;
> > }
> >
> > ...
> >
> > return im;
> > }
> >
> > Let's say the base is unsupported (e.g., -42), and endptr initially
> > points to nptr-1. Imagine this call:
> >
> > i = strtoimax(p + 1, &p, -42);
> >
> > ISO C doesn't specify what happens if the base is not between 0 and 36,
> > so the behavior is probably undefined in ISO C.
> >
> > POSIX says it returns 0 and sets errno to EINVAL, but doesn't say what
> > happens to endptr. I expect two possible implementations:
> >
> > - Leave endptr untouched.
> > - Set *endptr = nptr.
> >
> > Let's suppose it leaves endptr untouched (otherwise, it would be
> > impossible to portably differentiate an EINVAL due to unsupported base
> > from an EINVAL due to no digits in the string).
> >
> > So, the test (nptr == *endptr) would be false (because p+1 != p), and
> > the code would jump into accessing **endptr without having derived
> > that pointer from nptr, which is a violation of restrict.
>
> Oops, it's within an (errno == 0) path, so *endptr is guaranteed to be
> derived from nptr here.
>
> So no bug, but still unclear to me what's the benefit of using restrict,
The section "7. Library" at [1] has some information about the 'restrict'
keyword.
I think the restrict keywords compel the programmer to keep the string
(or that portion of the string that strtol actually accesses) and the
pointer to a string in non-overlapping memory regions. Calling
strtol(p, &p, 0) should be well-defined in such cases.
-------------------
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n881.pdf
-Amol
> and also unclear why GCC doesn't warn about it at call site.
>
> > I made many assumptions here, where the standards are not clear, so I
> > may be wrong in some of them. But it looks to me like a bug.
> >
> > CCing libbsd.
> >
> > Cheers,
> > Alex
> >
> > --
> > <https://www.alejandro-colomar.es/>
>
>
>
> --
> <https://www.alejandro-colomar.es/>
More information about the libbsd
mailing list