[BUG] etnaviv: broken timeouts

Arnd Bergmann arnd at arndb.de
Mon Nov 6 15:42:56 UTC 2017


On Mon, Nov 6, 2017 at 4:01 PM, Russell King - ARM Linux
<linux at armlinux.org.uk> wrote:
> On Mon, Nov 06, 2017 at 03:43:37PM +0100, Arnd Bergmann wrote:
>> On Mon, Nov 6, 2017 at 3:17 PM, Arnd Bergmann <arnd at arndb.de> wrote:

>> However, the comment you introduce is a bit misleading in the 'timespec'
>> version:
>>
>> > +/*
>> > + * Etnaviv timeouts are specified wrt CLOCK_MONOTONIC, not jiffies.
>> > + * We need to calculate the timeout in terms of number of jiffies
>> > + * between the specified timeout and the current CLOCK_MONOTONIC time.
>> > + * Note: clock_gettime() is 32-bit on 32-bit arch.  Using 64-bit
>> > + * timespec math here just means that when a wrap occurs, the
>> > + * specified timeout goes into the past and we can't request a
>> > + * timeout in the future: IOW, the code breaks.
>> > + */
>>
>> We already know that we will fix clock_gettime() to return 64-bit
>> seconds in the future, and at that point the comment no longer
>> makes any sense.
>>
>> We also know that CLOCK_MONOTONIC doesn't overflow 32-bit
>> interfaces in practice, and we rely on that in lots of places in the
>> kernel, but we still use ktime_t or timespec64 with clock_monotonic
>> for consistency with the future fixed user space.
>
> If CLOCK_MONOTONIC doesn't overflow the 31-bit value (so doesn't go
> negative) we still end up with times in the future via the 64-bit
> timeout API, because what userspace will do is something like this:
>
> foo(struct drm_etnaviv_timespec *ts, unsigned ms)
> {
>         struct timespec now;
>
>         clock_gettime(CLOCK_MONOTONIC, &now);
>
>         ts->tv_sec = now.tv_sec + timeout / 1000;
>         ts->tv_nsec = now.tv_nsec + (timeout % 1000) * 1000 * 1000;
>         if (ts->tv_nsec > 1000 * 1000 * 1000) {
>                 ts->tv_nsec -= 1000 * 1000 * 1000;
>                 ts->tv_sec += 1;
>         }
> }
>
> What that code ends up doing is quite subtle, and probably not what
> is obvious.  ts->tv_sec is a unsigned 64-bit variable.  now.tv_sec
> is a 32-bit signed variable.
>
> What we actually end up with is ts->tv_sec being assigned a 32-bit
> unsigned quantity which is the addition of those two values.

But the 68-year uptime limit can't hit us before the y2038 limit,
so if that user space snippet ever approaches 68 years of
uptime, it will have been built with a new glic that makes
now.tv_sec 64-bit as well, or it will have crashed long before
then in some other function.

> However, if we use the timespec version in the kernel, then the
> 68 year limit still applies - although tv_sec in drm_etnaviv_timespec
> is a positive number, when casted down to a timespec, it becomes a
> negative number.  So, things go wrong until the 32-bit ktime_get()
> also wraps to a negative number.  This means the failure is temporary
> and self-recovers.

Right, but if we use 64-bit time_t in user space and pass the
correct time, then the failure won't even happen to start with, it
only occurs because of the cast down to timespec ;-)

> I'm not saying that one or other is better, just pointing out the
> effects of the approaches.

Ok, can we then do the timespec64 variant please?

There is already an agreement to remove all timespec users
and ktime_get_ts() (as of the latest tip/timers branch) is now
deprecated, so I'd prefer not to add any new users that we
just need to replace again in the future.

Even though this particular interface has no y2038 issue,
removing all 'timespec' usage in the kernel one at a time
helps tremendously in figuring out the remaining issues.

> Note that the man page for clock_gettime() doesn't define what
> happens to CLOCK_MONOTONIC when it gets to the wrap point.  Neither
> does the open group spec issue 7.

I'm pretty sure it's not really concerned an issue in practice:
CLOCK_REALTIME is guaranteed to wrap before CLOCK_MONOTONIC,
and we already rely on that not wrapping.

      Arnd


More information about the etnaviv mailing list