[Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jan 27 09:12:00 UTC 2021
On 26/01/2021 22:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-26 16:47:14)
>> On 26/01/2021 13:05, Chris Wilson wrote:
>>> The client id used is a cyclic allocator as that reduces the likelihood
>>> of userspace seeing the same id used again (and so confusing the new
>>> client as the old). Verify that each new client has an id greater than
>>> the last.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> ---
>>> tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------
>>> 1 file changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c
>>> index a3a1f81e1..d2c1ebc5f 100644
>>> --- a/tests/i915/sysfs_clients.c
>>> +++ b/tests/i915/sysfs_clients.c
>>> @@ -8,6 +8,7 @@
>>> #include <errno.h>
>>> #include <fcntl.h>
>>> #include <inttypes.h>
>>> +#include <limits.h>
>>> #include <math.h>
>>> #include <sched.h>
>>> #include <sys/socket.h>
>>> @@ -47,6 +48,8 @@
>>> #define assert_within_epsilon(x, ref, tolerance) \
>>> __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.)
>>>
>>> +#define BUFSZ 280
>>> +
>>> #define MI_BATCH_BUFFER_START (0x31 << 23)
>>> #define MI_BATCH_BUFFER_END (0xa << 23)
>>> #define MI_ARB_CHECK (0x5 << 23)
>>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients)
>>> {
>>> struct dirent *de;
>>> int sv[2], rv[2];
>>> - char buf[280];
>>> + char buf[BUFSZ];
>>> int me = -1;
>>> long count;
>>> pid_t pid;
>>> @@ -180,7 +183,7 @@ static long count_clients(int clients)
>>> {
>>> struct dirent *de;
>>> long count = 0;
>>> - char buf[280];
>>> + char buf[BUFSZ];
>>> DIR *dir;
>>>
>>> dir = fdopendir(dup(clients));
>>> @@ -229,32 +232,113 @@ static void create(int i915, int clients)
>>> igt_assert_eq(count_clients(clients), 1);
>>> }
>>>
>>> +static const char *find_client(int clients, pid_t pid, char *buf)
>>> +{
>>> + DIR *dir = fdopendir(dup(clients));
>>> +
>>> + /* Reading a dir as it changes does not appear to be stable, SEP */
>>
>> Oh yes, it is POSIX undefined as far as I could figure out. You are
>> creating/destroying clients while reading the dir from different
>> threads?
>
> Different processes on different directory fd. man readdir(3) does say
> that glibc is threadsafe, but that is not a requirement of POSIX. The
> problem we are seeing is that the directory contents are not stable
> around the readdir loop as clients are being created/destroyed, causing
> dirents to be missed.
>
> stracing the problem shows that glibc used a 32KiB buffer for getdents
> and only a single syscall was required to grab the entire directory. No
> errors were seen before the missed dirent. It just seemed to be missing.
>
> Afaict there is no delay in creating the sysfs entry, and I think there
> should also be no delay in creating the kernfs node and inodes, so my
> initial assumption is that it's something not quite happy in the
> kernfs directory that shows up under stress. A second loop ran for a
> long time without incident locally, but CI seems much more adept at
> finding it.
I think we are talking about the same thing..
>> I think best might be to introduce explicit sync points between
>> those two entities to make this reliable. In another review for the same
>> problem I suggested pipes for implementing this handshake. Along the
>> lines of "let child open/close some processes, make it wait for parent;
>> parent scans sysfs, signals child to open/close some more; repeat for a
>> while".
>
> Sadly, I don't expect userspace to do that :)
>
> I do expect userspace to search for its own client/ upon creation, and I
> expect there to be many clients being opened at once. So I think so long
> as the use of the library readdir is correct (distinct processes with
> distinct directory fd, so I think there's no accidental sharing) this
> represents. Hmm. fsync(dirfd).
...and I think it is just not guaranteed that reading the dir in
parallel to dentries being created/deleted is guaranteed to work (no
guarantee for discovering newly addded or deleted clients since the
iteration started). Or you are saying that a client opens itself starts
readdir loop and then fails to find itself?
Otherwise, I was thinking, if we wanted to allow clients to inspect
themselves, we should really add a getrusage(2) like ioctl.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list