[systemd-devel] [PATCH] Fix broken syscall(__NR_fanotify_mark... on 32bit mips.

David Daney ddaney at caviumnetworks.com
Wed Apr 20 10:36:49 PDT 2011


On 04/20/2011 09:14 AM, Eric Paris wrote:
> On Wed, 2011-04-20 at 12:16 +0200, Aurelien Jarno wrote:
>> On Wed, Apr 20, 2011 at 03:22:08AM +0200, Lennart Poettering wrote:
>>> On Wed, 20.04.11 09:15, fykcee1 at gmail.com (fykcee1 at gmail.com) wrote:
>>>
>>>>
>>>> 2011/4/20 Lennart Poettering<lennart at poettering.net>
>>>>>
>>>>> On Thu, 14.04.11 17:34, fykcee1 at gmail.com (fykcee1 at gmail.com) wrote:
>>>>>
>>>>>> diff --git a/src/missing.h b/src/missing.h
>>>>>> index 35e209f..b367831 100644
>>>>>> --- a/src/missing.h
>>>>>> +++ b/src/missing.h
>>>>>> @@ -125,7 +125,12 @@ static inline int fanotify_init(unsigned int flags, unsigned int event_f_flags)
>>>>>>
>>>>>>   static inline int fanotify_mark(int fanotify_fd, unsigned int flags, uint64_t mask,
>>>>>>                                   int dfd, const char *pathname) {
>>>>>> -        return syscall(__NR_fanotify_mark, fanotify_fd, flags, mask, dfd, pathname);
>>>>>> +        if (sizeof(void *) == 4)
>>>>>> +                return syscall(__NR_fanotify_mark, fanotify_fd, flags,
>>>>>> +                               *((uint32_t *)&mask), *((uint32_t *)&mask + 1),
>>>>>> +                               dfd, pathname);
>>>>>> +        else /* 64bit */
>>>>>> +                return syscall(__NR_fanotify_mark, fanotify_fd, flags, mask, dfd, pathname);
>>>>>>   }
>>>>>>
>>>>>>   #ifndef BTRFS_IOCTL_MAGIC
>>>>>
>>>>> Hmm, wouldn't this break x86-32?
>>>> It works fine on my asus laptop with a 32bit CPU Pentium-M.
>>>>
>>>> We pass two 32bit arguments instead of one 64bit argument. syscall()
>>>> doesn't know the prototype, I guess it will only fetch 9 native-width
>>>> arguments(1 syscall number + 8 parameters) from the stack. Aurelien,
>>>> am I right?
>>>
>>> Hmm, I am not sure I understand this? How is it possible that both ways
>>> work on x86? There must be quite a difference between passing 6 or 7
>>> arguments to syscall.
>>
>> syscall(64_bit_arg) actually pass the 64-bit argument in two different
>> registers (or on two different 32-bit stack entries). Then the GNU libc
>> copies the registers while converting from the userspace ABI to the
>> kernel ABI (usually passing more values into registers).
>>
>> On a lot of 32-bit cpus which needs alignment, a 64-bit arguments have
>> to be stack aligned, or register aligned. That's why due to the userspace
>> ABI, one register or one stack entry has to be empty. This later break
>> when this empty register arrives at the kernel level.
>>
>>> Or are you suggesting that the current invocation is broken even on x86,
>>> just doesn't become visible since only the lower 32 bit of "mask" are
>>> used?
>>>
>>
>> No, on x86 it just work "by chance", as x86 is very flexible with
>> alignement.
>
> When I implemented the kernel interface I remember someone brought up
> 64-bit alignment and it was suggest I go with the ordering I have.  I
> understood that fanotify_fd would be the first 32 bits, flags be the
> second and then mask should be 64-bit aligned as the 3rd/4th?
>
> I am pretty arch stupid, and I have no idea at all if it is related, but
> I'm looking at kernel commit 5e844b31c2ace282ab8bea630b63e0212d9532d4
> which wires up the fanotify syscalls for mips.  I see that it used a u64
> for a3 and a4 when these are only going to be 32 values.  Does that
> matter?

It is correct, because it is what is expected by merge_64().

>  I also see that some prototype such as sys32_readahead()
> explicitly added a pad into the 32 helper.  Should we be doing that?

No.

The o32 ABI requires that 64-bit arguments be aligned starting in an 
even argument register number.  Since there are already two 32-bit 
arguments, no padding is needed.

>  I
> also notice that the naming of arguments seems wrong (although it
> shouldn't matter)

That was an unfortunate mistake.  If someone wanted to, we could do:

s/a3/a2/
s/a4/a3/

That might reduce the confusing somewhat.

>
> SYSCALL_DEFINE6(32_fanotify_mark, int, fanotify_fd, unsigned int, flags,
>                  u64, a3, u64, a4, int, dfd, const char  __user *, pathname)
>
> I would have thought equivalently:
>    fanotify_fd == a0
>    flags == a1
>    'a3' == a2
>    'a4' == a3
>    dfd == a4
>
> etc.....
>
> Anyway, no idea if it is related, but I thought I designed the
> systemcall with this padding in mind.....

It is difficult to to call fanotify_mark() using the o32 syscall().

You would have to do something like this (untested):

int foo_fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask, int 
dfd, const char  __user * pathname)
{
	u32 mask_low = (u32)mask;
	u32 mask_high = (u32)(mask >> 32);

	return syscall(4337, fanotify_fd, flags, mask_low, mask_high, dfd, 
pathname);

}

The order of mask_low, mask_high in the syscall argument list depends on 
the endianness.  Figuring out the correct order is left as an exercise 
for the reader.


David Daney


More information about the systemd-devel mailing list