[igt-dev] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Jan 27 09:12:00 UTC 2021


On 26/01/2021 22:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-26 16:47:14)
>> On 26/01/2021 13:05, Chris Wilson wrote:
>>> The client id used is a cyclic allocator as that reduces the likelihood
>>> of userspace seeing the same id used again (and so confusing the new
>>> client as the old). Verify that each new client has an id greater than
>>> the last.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> ---
>>>    tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------
>>>    1 file changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c
>>> index a3a1f81e1..d2c1ebc5f 100644
>>> --- a/tests/i915/sysfs_clients.c
>>> +++ b/tests/i915/sysfs_clients.c
>>> @@ -8,6 +8,7 @@
>>>    #include <errno.h>
>>>    #include <fcntl.h>
>>>    #include <inttypes.h>
>>> +#include <limits.h>
>>>    #include <math.h>
>>>    #include <sched.h>
>>>    #include <sys/socket.h>
>>> @@ -47,6 +48,8 @@
>>>    #define assert_within_epsilon(x, ref, tolerance) \
>>>        __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.)
>>>    
>>> +#define BUFSZ 280
>>> +
>>>    #define MI_BATCH_BUFFER_START (0x31 << 23)
>>>    #define MI_BATCH_BUFFER_END (0xa << 23)
>>>    #define MI_ARB_CHECK (0x5 << 23)
>>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients)
>>>    {
>>>        struct dirent *de;
>>>        int sv[2], rv[2];
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        int me = -1;
>>>        long count;
>>>        pid_t pid;
>>> @@ -180,7 +183,7 @@ static long count_clients(int clients)
>>>    {
>>>        struct dirent *de;
>>>        long count = 0;
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        DIR *dir;
>>>    
>>>        dir = fdopendir(dup(clients));
>>> @@ -229,32 +232,113 @@ static void create(int i915, int clients)
>>>        igt_assert_eq(count_clients(clients), 1);
>>>    }
>>>    
>>> +static const char *find_client(int clients, pid_t pid, char *buf)
>>> +{
>>> +     DIR *dir = fdopendir(dup(clients));
>>> +
>>> +     /* Reading a dir as it changes does not appear to be stable, SEP */
>>
>> Oh yes, it is POSIX undefined as far as I could figure out. You are
>> creating/destroying clients while reading the dir from different
>> threads?
> 
> Different processes on different directory fd. man readdir(3) does say
> that glibc is threadsafe, but that is not a requirement of POSIX. The
> problem we are seeing is that the directory contents are not stable
> around the readdir loop as clients are being created/destroyed, causing
> dirents to be missed.
> 
> stracing the problem shows that glibc used a 32KiB buffer for getdents
> and only a single syscall was required to grab the entire directory. No
> errors were seen before the missed dirent. It just seemed to be missing.
> 
> Afaict there is no delay in creating the sysfs entry, and I think there
> should also be no delay in creating the kernfs node and inodes, so my
> initial assumption is that it's something not quite happy in the
> kernfs directory that shows up under stress. A second loop ran for a
> long time without incident locally, but CI seems much more adept at
> finding it.

I think we are talking about the same thing..

>> I think best might be to introduce explicit sync points between
>> those two entities to make this reliable. In another review for the same
>> problem I suggested pipes for implementing this handshake. Along the
>> lines of "let child open/close some processes, make it wait for parent;
>> parent scans sysfs, signals child to open/close some more; repeat for a
>> while".
> 
> Sadly, I don't expect userspace to do that :)
> 
> I do expect userspace to search for its own client/ upon creation, and I
> expect there to be many clients being opened at once. So I think so long
> as the use of the library readdir is correct (distinct processes with
> distinct directory fd, so I think there's no accidental sharing) this
> represents. Hmm. fsync(dirfd).

...and I think it is just not guaranteed that reading the dir in 
parallel to dentries being created/deleted is guaranteed to work (no 
guarantee for discovering newly addded or deleted clients since the 
iteration started). Or you are saying that a client opens itself starts 
readdir loop and then fails to find itself?

Otherwise, I was thinking, if we wanted to allow clients to inspect 
themselves, we should really add a getrusage(2) like ioctl.

Regards,

Tvrtko


More information about the igt-dev mailing list