[BUG] etnaviv: broken timeouts

Russell King - ARM Linux linux at armlinux.org.uk
Mon Nov 6 14:04:39 UTC 2017


On Mon, Nov 06, 2017 at 01:42:10PM +0100, Arnd Bergmann wrote:
> On Mon, Nov 6, 2017 at 11:35 AM, Lucas Stach <l.stach at pengutronix.de> wrote:
> > +CC Arnd, who seems to care about timeout and year 2038 issues.
> >
> > Am Freitag, den 03.11.2017, 18:41 +0000 schrieb Russell King - ARM
> > Linux:
> >> On Fri, Nov 03, 2017 at 07:07:44PM +0100, Christian Gmeiner wrote:
> >> > If you send this change as patch you can add my
> >> > Reviewed-by: Christian Gmeiner <christian.gmeiner at gmail.com>
> >> >
> >> > Hopefully Lucas will pick it up your patch soon.
> >>
> >> I would really like to hear from Thomas on it due to the problem
> >> that remains - that is that timespec_compare() will return negative
> >> if timeout->tv_sec is numerically less than ts.tv_sec.
> >>
> >> So, when we get close to 2037, and our timeout overflows the wrap,
> >> timeout->tv_sec becomes negative, and is immediately numerically
> >> less than ts.tv_sec, so this function starts failing.
> >>
> >> It's not just "current time close to 2038" - if we're reading the
> >> current time and adding (eg) (u32)~0 milliseconds to it (as my
> >> DDX driver does for "infinite") we'll start seeing problems about
> >> 50 days prior to the roll over.
> >>
> >> Using the 64-bit math version in the kernel may help push the date
> >> up to the roll-over date, but doesn't solve the problem.
> >>
> >> I think we have a few years before we need to be concerned about it,
> >> but I'd like to have a solution.
> >>
> >> Kind'a feels like we need some way to ask the kernel for an
> >> drm_etnaviv_timespec that is (u64) X nsec into the future - that'd
> >> also eliminate the "which clock_gettime() clockid should we be using"
> >> question.
> >
> > There is a bunch of drivers using absolute timeouts and we probably
> > need to deal with 32bit systems for a while still. I would really like
> > to see a solution for this which isn't specific to etnaviv, but adds a
> > core helper for those situations which has the blessing of the clock
> > guys. Hopefully Arnd or Thomas can provide some advise.
> 
> As far as I can tell, the driver does not suffer from an overflow, since
> all the timeouts that get passed from user space are in
> CLOCK_MONOTONIC time, which never overflows a 32-bit seconds
> value unless you run the kernel uninterrupted for 136 years without
> rebooting.

I think you're wrong on that.  "struct timespec" is defined such that
tv_sec is of type __kernel_time_t, which is signed.  It overflows to
negative after 68 years, not 136 years.  It's still quite a time.

> There is also no problem with the user space ABI, since the
> drm_etnaviv_wait_fence structure uses fixed-length fields
> rather than 'time_t' or 'struct timespec' that will get redefined by
> glibc to use 64-bit timestamps in the next few years. (thanks
> a lot to whoever was paying attention while defining the ioctls!)

Whether glibc translates the kernel interface's timespec into its own
64-bit type is neither here nor there - as long as something in that
path truncates it to a 32-bit signed type, there's a 68 year wrap
problem.

> However, there are a few other problems with the code in question:
> 
> ...

See my patches that correct the etnaviv_timeout_to_jiffies() function.
It's silly to comment on something that we already know is broken and
which have patches proposed.

> - Using 'timespec' in drivers is generally a bad idea because of the
>   overflow that Russell mentioned. We have already converted
>   hundreds of drivers to use either 'timespec64' or ktime_t instead,
>   which is generally a safer option. I hope to get rid of all uses of
>   timespec in the kernel at some point, and I see that as a prerequisite
>   for shipping products that want to live beyond 2038.

The problem with using timespec64 while userspace still uses a 32-bit
timespec is that the wrap is still very much a hard deadline, because
even if we have etnaviv accept timestamps beyond that, userspace can't
create timestamps that are beyond the wrap timestamp plus whatever
timeout they want.  When their idea of time obtained via a struct
timespec wraps, then it becomes negative, and is therefore "before"
the kernel's idea of 64-bit time.

> I have prototyped a patch to address all of the above, will send
> that later today after doing a bit of build testing. I hope you can
> give it some testing, and clarify what the timebase is that
> user space expects to put into the timeout (MONOTONIC,
> MONOTONIC_RAW, or REALTIME).

Could you look at the patches I proposed, which have been tested,
instead of recreating my work?

If you want to recreate my work, we're going to have to wait another
50 odd days before we can say whether they work or not, because that's
how long it takes for the current code to go wrong, and I've already
rebooted the platforms.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


More information about the etnaviv mailing list