[BUG] etnaviv: broken timeouts

Russell King - ARM Linux linux at armlinux.org.uk
Mon Nov 6 16:18:31 UTC 2017


On Mon, Nov 06, 2017 at 04:42:56PM +0100, Arnd Bergmann wrote:
> On Mon, Nov 6, 2017 at 4:01 PM, Russell King - ARM Linux
> <linux at armlinux.org.uk> wrote:
> > On Mon, Nov 06, 2017 at 03:43:37PM +0100, Arnd Bergmann wrote:
> >> On Mon, Nov 6, 2017 at 3:17 PM, Arnd Bergmann <arnd at arndb.de> wrote:
> 
> >> However, the comment you introduce is a bit misleading in the 'timespec'
> >> version:
> >>
> >> > +/*
> >> > + * Etnaviv timeouts are specified wrt CLOCK_MONOTONIC, not jiffies.
> >> > + * We need to calculate the timeout in terms of number of jiffies
> >> > + * between the specified timeout and the current CLOCK_MONOTONIC time.
> >> > + * Note: clock_gettime() is 32-bit on 32-bit arch.  Using 64-bit
> >> > + * timespec math here just means that when a wrap occurs, the
> >> > + * specified timeout goes into the past and we can't request a
> >> > + * timeout in the future: IOW, the code breaks.
> >> > + */
> >>
> >> We already know that we will fix clock_gettime() to return 64-bit
> >> seconds in the future, and at that point the comment no longer
> >> makes any sense.
> >>
> >> We also know that CLOCK_MONOTONIC doesn't overflow 32-bit
> >> interfaces in practice, and we rely on that in lots of places in the
> >> kernel, but we still use ktime_t or timespec64 with clock_monotonic
> >> for consistency with the future fixed user space.
> >
> > If CLOCK_MONOTONIC doesn't overflow the 31-bit value (so doesn't go
> > negative) we still end up with times in the future via the 64-bit
> > timeout API, because what userspace will do is something like this:
> >
> > foo(struct drm_etnaviv_timespec *ts, unsigned ms)
> > {
> >         struct timespec now;
> >
> >         clock_gettime(CLOCK_MONOTONIC, &now);
> >
> >         ts->tv_sec = now.tv_sec + timeout / 1000;
> >         ts->tv_nsec = now.tv_nsec + (timeout % 1000) * 1000 * 1000;
> >         if (ts->tv_nsec > 1000 * 1000 * 1000) {
> >                 ts->tv_nsec -= 1000 * 1000 * 1000;
> >                 ts->tv_sec += 1;
> >         }
> > }
> >
> > What that code ends up doing is quite subtle, and probably not what
> > is obvious.  ts->tv_sec is a unsigned 64-bit variable.  now.tv_sec
> > is a 32-bit signed variable.
> >
> > What we actually end up with is ts->tv_sec being assigned a 32-bit
> > unsigned quantity which is the addition of those two values.
> 
> But the 68-year uptime limit can't hit us before the y2038 limit,
> so if that user space snippet ever approaches 68 years of
> uptime, it will have been built with a new glic that makes
> now.tv_sec 64-bit as well, or it will have crashed long before
> then in some other function.
> 
> > However, if we use the timespec version in the kernel, then the
> > 68 year limit still applies - although tv_sec in drm_etnaviv_timespec
> > is a positive number, when casted down to a timespec, it becomes a
> > negative number.  So, things go wrong until the 32-bit ktime_get()
> > also wraps to a negative number.  This means the failure is temporary
> > and self-recovers.
> 
> Right, but if we use 64-bit time_t in user space and pass the
> correct time, then the failure won't even happen to start with, it
> only occurs because of the cast down to timespec ;-)
> 
> > I'm not saying that one or other is better, just pointing out the
> > effects of the approaches.
> 
> Ok, can we then do the timespec64 variant please?
> 
> There is already an agreement to remove all timespec users
> and ktime_get_ts() (as of the latest tip/timers branch) is now
> deprecated, so I'd prefer not to add any new users that we
> just need to replace again in the future.
> 
> Even though this particular interface has no y2038 issue,
> removing all 'timespec' usage in the kernel one at a time
> helps tremendously in figuring out the remaining issues.
> 
> > Note that the man page for clock_gettime() doesn't define what
> > happens to CLOCK_MONOTONIC when it gets to the wrap point.  Neither
> > does the open group spec issue 7.
> 
> I'm pretty sure it's not really concerned an issue in practice:
> CLOCK_REALTIME is guaranteed to wrap before CLOCK_MONOTONIC,
> and we already rely on that not wrapping.

I don't think you understood my point there properly.

The starting point of CLOCK_MONOTONIC is undefined, so we can't really
say "it'll always be zero or positive".

With the userspace snippet I gave, if CLOCK_MONOTONIC returns a negative
tv_sec (eg, because it starts off as a negative number), the
drm_etnaviv_timespec gets a large 32-bit positive tv_sec which will be
way in the future, far longer than is expected, even for a short timeout.

This makes use of timespec64 end up with a timeout way into the future
and timeouts are far longer (years) than they were intended (maybe
milliseconds).

So, what should a correct userspace snippet for this look like today
given that we need to take a 32-bit signed timespec and calculate a
timestamp for the expiry time and place it into a 64-bit unsigned
timespec?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


More information about the etnaviv mailing list