restrictness of strtoi(3bsd) and strtol(3)

Sun Dec 3 10:59:07 UTC 2023

On Sat, 2 Dec 2023 at 18:05, Alejandro Colomar via Gcc-help
<gcc-help at gcc.gnu.org> wrote:
>
> On Sat, Dec 02, 2023 at 01:29:01PM +0100, Alejandro Colomar wrote:
> > On Sat, Dec 02, 2023 at 12:50:28PM +0100, Alejandro Colomar wrote:
> > > Hi,
> > >
> > > I've been implementing my own copy of strto[iu](3bsd), to avoid the
> > > complexity of calling strtol(3) et al.  In the process, I've noticed
> > > that all of these functions use restrict for their parameters.
> > >
> > > Why do these functions use restrict?  While the second parameter is not
> > > used for accessing nptr memory (**endptr is not accessed), it can point
> > > to the same memory.  Here is an example of how these functions can have
> > > pointers to the same memory in the two arguments.
> > >
> > >     l = strtol(p, &p, 0);
> > >
> > > The use of restrict in the prototype of the function could result in
> > > compiler warnings, no?  Currently, I don't see any warnings, but I
> > > suspect the compiler could complain, since the same memory is available
> > > to the function via two different arguments (albeit with a different
> > > number of references).
> > >
> > > The use of restrict in the definition of the function doesn't help the
> > > optimizer, since it already knows that the second parameter is out-only,
> > > so even if it weren't restrict, the only way to access memory is via the
> > > first parameter.
> >
> > In the case of strto[iu](3bsd), I have even more doubts.
> >
> > Here's libbsd's version of it (omitting unimportant parts):
> >
> >       $ grepc -tfd strtoi .
> >       ./src/strtoi.c:intmax_t
> >       strtoi(const char *__restrict nptr,
> >              char **__restrict endptr, int base,
> >              intmax_t lo, intmax_t hi, int *rstatus)
> >       {
> >               ...
> >
> >               im = strtoimax(nptr, endptr, base);
> >
> >               *rstatus = errno;
> >               errno = serrno;
> >
> >               if (*rstatus == 0) {
> >                       /* No digits were found */
> >                       if (nptr == *endptr)
> >                               *rstatus = ECANCELED;
> >                       /* There are further characters after number */
> >                       else if (**endptr != '\0')
> >                               *rstatus = ENOTSUP;
> >               }
> >
> >               ...
> >
> >               return im;
> >       }
> >
> > Let's say the base is unsupported (e.g., -42), and endptr initially
> > points to nptr-1.  Imagine this call:
> >
> >       i = strtoimax(p + 1, &p, -42);
> >
> > ISO C doesn't specify what happens if the base is not between 0 and 36,
> > so the behavior is probably undefined in ISO C.
> >
> > POSIX says it returns 0 and sets errno to EINVAL, but doesn't say what
> > happens to endptr.  I expect two possible implementations:
> >
> > -  Leave endptr untouched.
> > -  Set *endptr = nptr.
> >
> > Let's suppose it leaves endptr untouched (otherwise, it would be
> > impossible to portably differentiate an EINVAL due to unsupported base
> > from an EINVAL due to no digits in the string).
> >
> > So, the test (nptr == *endptr) would be false (because p+1 != p), and
> > the code would jump into accessing **endptr without having derived
> > that pointer from nptr, which is a violation of restrict.
>
> Oops, it's within an (errno == 0) path, so *endptr is guaranteed to be
> derived from nptr here.
>
> So no bug, but still unclear to me what's the benefit of using restrict,

The section "7. Library" at [1] has some information about the 'restrict'
keyword.

I think the restrict keywords compel the programmer to keep the string
(or that portion of the string that strtol actually accesses) and the
pointer to a string in non-overlapping memory regions. Calling
strtol(p, &p, 0) should be well-defined in such cases.
-------------------
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n881.pdf
-Amol

> and also unclear why GCC doesn't warn about it at call site.
>
> > I made many assumptions here, where the standards are not clear, so I
> > may be wrong in some of them.  But it looks to me like a bug.
> >
> > CCing libbsd.
> >
> > Cheers,
> > Alex
> >
> > --
> > <https://www.alejandro-colomar.es/>
>
>
>
> --
> <https://www.alejandro-colomar.es/>