Fwd: Re: [PATCH xwayland] xwayland-shm: fortify fallocate against EINTR

Mon Apr 25 12:38:21 UTC 2016

On 04/25/16 12:20, Pekka Paalanen wrote:
> On Mon, 25 Apr 2016 11:33:00 +0200
> Marek Chalupa <mchqwerty at gmail.com> wrote:
>
>> If posix_fallocate or ftruncate is interrupted by signal
>> while working, we return -1 as fd and the allocation process
>> returns BadAlloc error. That causes xwayland clients to abort
>> with 'BadAlloc (insufficient resources for operation)'
>> even when there's a lot of resources available.
>>
>> Fix it by trying again when we get EINTR.
>>
>> Signed-off-by: Marek Chalupa <mchqwerty at gmail.com>
>> ---
>>   hw/xwayland/xwayland-shm.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/xwayland/xwayland-shm.c b/hw/xwayland/xwayland-shm.c
>> index e8545b3..c199e5e 100644
>> --- a/hw/xwayland/xwayland-shm.c
>> +++ b/hw/xwayland/xwayland-shm.c
>> @@ -140,14 +140,20 @@ os_create_anonymous_file(off_t size)
>>           return -1;
>>
>>   #ifdef HAVE_POSIX_FALLOCATE
>> -    ret = posix_fallocate(fd, 0, size);
>> +    do {
>> +        ret = posix_fallocate(fd, 0, size);
>> +    } while (ret == EINTR);
>> +
>>       if (ret != 0) {
>>           close(fd);
>>           errno = ret;
>>           return -1;
>>       }
>>   #else
>> -    ret = ftruncate(fd, size);
>> +    do {
>> +        ret = ftruncate(fd, size);
>> +    } while (ret == -1 && errno == EINTR);
>> +
>>       if (ret < 0) {
>>           close(fd);
>>           return -1;
>
> Hi Marek,
>
> curious, how did you hit this case? And is the signal that intercept
> these usually the smart scheduler's SIGALRM?

Hi Pekka,

under gnome-shell it's enough to open terminator and resize it very
quickly. Under Weston (and also gnome-shell) I hit this with Firefox.
Just start it and in a moment it goes down with this error.

Yes, it looks like SIGALRM (I put assert on change of SmartScheduleTime
when fallocate fails)

> I am asking, because I have someone suffering from the EINTR issue, but
> a simple restart like what you implemented here results in an endless
> loop. A new signal arrives before fallocate completes every time. It is
> like fallocate is not making any progress.
>
> What is more curious is that the file is supposedly on a tmpfs, yet in
> our case the 5 ms is not enough to fallocate a full-HD frame (8 MB). It
> is a "Low powered NXP arm platform" I am told, I do not have access to
> it myself.
>
> It may be the platform's fault that fallocate takes such a long time.
> Another thing is whether fallocate should make gradual progress or not;
> if not, simple restart will not work against a regular timer signal.

Here the platform is x86_64. Actually, this started happening some time
ago, but it was fine before. I don't know what made the change.

I don't know whether fallocate makes gradual progress, but in may case
it is fast enough in most cases, so it behaves more like race condition
- it is random whether the signal strikes in the time frame in which
fallocate runs. Therefore simple restart works fine here.
But you're right that if fallocate is too slow on other platforms,
this solution will not help there.

> That makes me wonder if in case of EINTR, we should revert to
> fallocating a series of small chunks instead. But that could also be
> nonsense and something else is broken, I just don't know.
>
> Any ideas, anyone?
>
> Should we be accounting for the possibility of an endless loop, or will
> that never happen on a good platform?
>
>
> Thanks,
> pq
>

Marek