[Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic
Chris Wilson
chris at chris-wilson.co.uk
Tue Jan 26 22:04:35 UTC 2021
Quoting Tvrtko Ursulin (2021-01-26 16:47:14)
>
> On 26/01/2021 13:05, Chris Wilson wrote:
> > The client id used is a cyclic allocator as that reduces the likelihood
> > of userspace seeing the same id used again (and so confusing the new
> > client as the old). Verify that each new client has an id greater than
> > the last.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > ---
> > tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------
> > 1 file changed, 108 insertions(+), 21 deletions(-)
> >
> > diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c
> > index a3a1f81e1..d2c1ebc5f 100644
> > --- a/tests/i915/sysfs_clients.c
> > +++ b/tests/i915/sysfs_clients.c
> > @@ -8,6 +8,7 @@
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <inttypes.h>
> > +#include <limits.h>
> > #include <math.h>
> > #include <sched.h>
> > #include <sys/socket.h>
> > @@ -47,6 +48,8 @@
> > #define assert_within_epsilon(x, ref, tolerance) \
> > __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.)
> >
> > +#define BUFSZ 280
> > +
> > #define MI_BATCH_BUFFER_START (0x31 << 23)
> > #define MI_BATCH_BUFFER_END (0xa << 23)
> > #define MI_ARB_CHECK (0x5 << 23)
> > @@ -75,7 +78,7 @@ static void pidname(int i915, int clients)
> > {
> > struct dirent *de;
> > int sv[2], rv[2];
> > - char buf[280];
> > + char buf[BUFSZ];
> > int me = -1;
> > long count;
> > pid_t pid;
> > @@ -180,7 +183,7 @@ static long count_clients(int clients)
> > {
> > struct dirent *de;
> > long count = 0;
> > - char buf[280];
> > + char buf[BUFSZ];
> > DIR *dir;
> >
> > dir = fdopendir(dup(clients));
> > @@ -229,32 +232,113 @@ static void create(int i915, int clients)
> > igt_assert_eq(count_clients(clients), 1);
> > }
> >
> > +static const char *find_client(int clients, pid_t pid, char *buf)
> > +{
> > + DIR *dir = fdopendir(dup(clients));
> > +
> > + /* Reading a dir as it changes does not appear to be stable, SEP */
>
> Oh yes, it is POSIX undefined as far as I could figure out. You are
> creating/destroying clients while reading the dir from different
> threads?
Different processes on different directory fd. man readdir(3) does say
that glibc is threadsafe, but that is not a requirement of POSIX. The
problem we are seeing is that the directory contents are not stable
around the readdir loop as clients are being created/destroyed, causing
dirents to be missed.
stracing the problem shows that glibc used a 32KiB buffer for getdents
and only a single syscall was required to grab the entire directory. No
errors were seen before the missed dirent. It just seemed to be missing.
Afaict there is no delay in creating the sysfs entry, and I think there
should also be no delay in creating the kernfs node and inodes, so my
initial assumption is that it's something not quite happy in the
kernfs directory that shows up under stress. A second loop ran for a
long time without incident locally, but CI seems much more adept at
finding it.
> I think best might be to introduce explicit sync points between
> those two entities to make this reliable. In another review for the same
> problem I suggested pipes for implementing this handshake. Along the
> lines of "let child open/close some processes, make it wait for parent;
> parent scans sysfs, signals child to open/close some more; repeat for a
> while".
Sadly, I don't expect userspace to do that :)
I do expect userspace to search for its own client/ upon creation, and I
expect there to be many clients being opened at once. So I think so long
as the use of the library readdir is correct (distinct processes with
distinct directory fd, so I think there's no accidental sharing) this
represents. Hmm. fsync(dirfd).
-Chris
More information about the Intel-gfx
mailing list