[BUG] etnaviv: broken timeouts

Arnd Bergmann arnd at arndb.de
Mon Nov 6 14:17:02 UTC 2017


On Mon, Nov 6, 2017 at 3:04 PM, Russell King - ARM Linux
<linux at armlinux.org.uk> wrote:
> On Mon, Nov 06, 2017 at 01:42:10PM +0100, Arnd Bergmann wrote:
>> On Mon, Nov 6, 2017 at 11:35 AM, Lucas Stach <l.stach at pengutronix.de> wrote:
>> > +CC Arnd, who seems to care about timeout and year 2038 issues.

>> As far as I can tell, the driver does not suffer from an overflow, since
>> all the timeouts that get passed from user space are in
>> CLOCK_MONOTONIC time, which never overflows a 32-bit seconds
>> value unless you run the kernel uninterrupted for 136 years without
>> rebooting.
>
> I think you're wrong on that.  "struct timespec" is defined such that
> tv_sec is of type __kernel_time_t, which is signed.  It overflows to
> negative after 68 years, not 136 years.  It's still quite a time.

Right, that's what I meant of course.

>> There is also no problem with the user space ABI, since the
>> drm_etnaviv_wait_fence structure uses fixed-length fields
>> rather than 'time_t' or 'struct timespec' that will get redefined by
>> glibc to use 64-bit timestamps in the next few years. (thanks
>> a lot to whoever was paying attention while defining the ioctls!)
>
> Whether glibc translates the kernel interface's timespec into its own
> 64-bit type is neither here nor there - as long as something in that
> path truncates it to a 32-bit signed type, there's a 68 year wrap
> problem.

The problem I'm referring to here affects all drivers that define a
structure like

struct foo {
    struct timespec timeout;
    ...
};

and pass that through an ioctl. Since 'timespec' is defined in the libc
header, changing 'time_t' in user space will also change the
layout of 'struct foo' in an incompatible way, regardless of whether
there is an overflow or not/

>> However, there are a few other problems with the code in question:
>>
>> ...
>
> See my patches that correct the etnaviv_timeout_to_jiffies() function.
> It's silly to comment on something that we already know is broken and
> which have patches proposed.

I had not seen any of that, I was only put on Cc today and looked at
what's currently in linux-next.

>> - Using 'timespec' in drivers is generally a bad idea because of the
>>   overflow that Russell mentioned. We have already converted
>>   hundreds of drivers to use either 'timespec64' or ktime_t instead,
>>   which is generally a safer option. I hope to get rid of all uses of
>>   timespec in the kernel at some point, and I see that as a prerequisite
>>   for shipping products that want to live beyond 2038.
>
> The problem with using timespec64 while userspace still uses a 32-bit
> timespec is that the wrap is still very much a hard deadline, because
> even if we have etnaviv accept timestamps beyond that, userspace can't
> create timestamps that are beyond the wrap timestamp plus whatever
> timeout they want.  When their idea of time obtained via a struct
> timespec wraps, then it becomes negative, and is therefore "before"
> the kernel's idea of 64-bit time.

Obviously we have to change user space to never wrap 'time_t',
but we can't really do that before fixing the kernel to expose interfaces
based on 64-bit time_t, so I really want to have timespec eliminated
from the kernel.

In case of etnaviv, the ABI is fine as I said, it already uses fixed-length
64-bit tv_sec, so that part is good and won't overflow as long as the
application calls the clock_gettime() replacement that we yet have to
add.

>> I have prototyped a patch to address all of the above, will send
>> that later today after doing a bit of build testing. I hope you can
>> give it some testing, and clarify what the timebase is that
>> user space expects to put into the timeout (MONOTONIC,
>> MONOTONIC_RAW, or REALTIME).
>
> Could you look at the patches I proposed, which have been tested,
> instead of recreating my work?
>
> If you want to recreate my work, we're going to have to wait another
> 50 odd days before we can say whether they work or not, because that's
> how long it takes for the current code to go wrong, and I've already
> rebooted the platforms.

I'll have a look, sure.

       Arnd


More information about the etnaviv mailing list