[Libreoffice] substantial 'glob' speedup ...

Michael Meeks michael.meeks at novell.com
Tue Apr 5 03:34:58 PDT 2011


Hi guys,

	Over in LibreOffice land, we're transitioning our whole build to use
gnumake - with the goal of having a single gnumake instance able to
re-build the (many thousands) of files we have, and store and act on all
of the dependencies.

	Anyhow - one problem we are seeing is that as we load and parse the
~50Mb of dependencies that we need (for part of writer) we are statting
the same files involved in dependencies sometimes a thousand times or
so. We do around 700k stats with lots of duplication.

	These ~all come from calling 'glob'; I append a patch that tries to
call glob only if needed - it could be done more prettily:

+      if (nlist != &name)

	not the nicest thing in the world; but I didn't want a big indentation
change. Timings for a make -sr with nothing to do are:

        before      after
real    0m5.795s    0m2.634s
user    0m3.513s    0m2.526s
sys     0m2.274s    0m0.101s

	Which is a worthwhile saving at least in our use case, though
naturally, being spectacularly incompetant - it is probably a
side-effect of me breaking everything ;-) Having said that, the
dependency rules (at least) appear to continue to work nicely when I
test with manual touching, and 'make check' passes ...

	Thoughts much appreciated,

	Thanks,

		Michael.

diff --git a/read.c b/read.c
index a3ad88e..48de4fe 100644
--- a/read.c
+++ b/read.c
@@ -2824,6 +2824,20 @@ tilde_expand (const char *name)
 #endif /* !VMS */
   return 0;
 }
+
+
+static int
+need_to_glob (const char *name)
+{
+  int i;
+  for (i = 0; name[i] != '\0'; i++) {
+    if (name[i] == '?' || name[i] == '*' || name[i] == '[') {
+      return 1;
+    }
+  }
+  return 0;
+}
+
 
 /* Parse a string into a sequence of filenames represented as a chain of
    struct nameseq's and return that chain.  Optionally expand the strings via
@@ -3112,6 +3126,14 @@ parse_file_seq (char **stringp, unsigned int size, int stopchar,
 	}
 #endif /* !NO_ARCHIVES */
 
+      /* glob is expensive - always stating, try to avoid it if possible */
+      if (!need_to_glob (name)) {
+	nlist = &name;
+	i = 1;
+	if (flags & PARSEFS_EXISTS && !file_exists_p (name))
+	  i = 0;
+      }
+      else
       switch (glob (name, GLOB_NOSORT|GLOB_ALTDIRFUNC, NULL, &gl))
 	{
 	case GLOB_NOSPACE:
@@ -3174,7 +3196,8 @@ parse_file_seq (char **stringp, unsigned int size, int stopchar,
 #endif /* !NO_ARCHIVES */
           NEWELT (concat (2, prefix, nlist[i]));
 
-      globfree (&gl);
+      if (nlist != &name)
+	globfree (&gl);
 
 #ifndef NO_ARCHIVES
       if (arname)


-- 
 michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot




More information about the LibreOffice mailing list