[poppler] RFC: patch: parallel testing

Carlos Garcia Campos carlosgc at gnome.org
Sat Dec 29 08:13:53 PST 2012


Adam Reichold <adamreichold at myopera.com> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Thomas,
>
> It is not a very sophisticated solution, but I think the attached
> patch should ensure useful messages for interleaved output of several
> worker threads.

I hadn't noticed this problem, and I can only reproduce it when running
the tests in verbose mode, does it only happen when stdout is not a tty?

> Best regards, Adam.
>
> Am 29.12.2012 10:32, schrieb Thomas Freitag:
>> Hi Adam!
>> 
>> I used yesterday the first time Your patch for regtesting: Awfully,
>> congratulations. You probably already encountered my comment on bug
>> 50992. One minor thing: Unfortunately the protocol is not really
>> readable anymore, because each thread i.e. first outputs
>> 
>> Testing '/media/thomas/HD-PCTU3/PDF Suite/01_ump_a_2009.pdf' using
>>  splash backend (9/1137):
>> 
>> and when it finishes i.e.
>> 
>> PASS
>> 
>> Because of the worker threads it's now nearly impossible to match 
>> the test itself with the result of the test. Can You change it in 
>> that way that this is outputted together? Should be quite easy....
>> 
>> Cheers, Thomas
>> 
>> Am 06.12.2012 19:02, schrieb Adam Reichold: Hello again,
>> 
>> I reformatted the patch hopefully solving the whitespace issues. 
>> (It does apply and run for me, but there are still some whitespace 
>> warnings.)
>> 
>> The use of recursive locks is probably not necessary in 
>> "TestRun.py" and "TestReferences.py", but since the performance 
>> impact is probably negligible in this case, I thought allowing for 
>> recursion would help to keep the code more accessible. In 
>> "Printer.py" there is at least one recursion from
>> "printout_update" into "printout".
>> 
>> Best regards, Adam.
>> 
>> Am 06.12.2012 18:19, schrieb Adam Reichold:
>>>>> Hello Carlos,
>>>>> 
>>>>> Am 06.12.2012 18:11, schrieb Carlos Garcia Campos:
>>>>>> Adam Reichold <adamreichold at myopera.com> writes:
>>>>>>> Hello again,
>>>>>>> 
>>>>>>> As suggested by William Bader, I added the option to use 
>>>>>>> the logical CPU count as the number of worker threads 
>>>>>>> automatically.
>>>>>> Thanks for the patch, any improvement in the regression 
>>>>>> test framework is more than welcome, even more if it's to 
>>>>>> improve the performance. What poppler have you sued to 
>>>>>> create the patches? They don't apply for me here using 
>>>>>> poppler from current git master.
>>>>> Yes, it is about performance and I am hesitant about 
>>>>> introducing such complexity into the test driver which is 
>>>>> supposed to test the correctness of program itself, but 
>>>>> faster regression testing probably means more regression 
>>>>> testing... :-)
>>>>> 
>>>>> I am not sure how to correctly format the patch. I built on 
>>>>> the current master but it seems Git thinks that the lines 
>>>>> where I just changed the indentation are whitespace errors.
>>>>> I am not sure how to correct this as changes in whitespace
>>>>> are obviously important for Python scripts... I'll try to 
>>>>> research this, but any advice would be welcome.
>>>>> 
>>>>> Best regards, Adam.
>>>>> 
>>>>>>> Regards, Adam.
>>>>>>> 
>>>>>>> Am 04.12.2012 12:28, schrieb Adam Reichold:
>>>>>>>> Hello Thomas,
>>>>>>>> 
>>>>>>>> Am 04.12.2012 08:41, schrieb Thomas Freitag:
>>>>>>>>> Am 04.12.2012 07:45, schrieb Adam Reichold: Hello 
>>>>>>>>> everyone,
>>>>>>>>> 
>>>>>>>>> I currently try to get myself acquainted with 
>>>>>>>>> Poppler's regression testing framework. Because my 
>>>>>>>>> system has a rather low single-threaded performance, 
>>>>>>>>> I tried to implement parallel testing using Python's 
>>>>>>>>> Queue class.
>>>>>>>>> 
>>>>>>>>> Even though poppler-regtest currently uses two 
>>>>>>>>> processes per test file, rendering even and odd
>>>>>>>>> pages respectively, the test files themselves are
>>>>>>>>> still handled sequentially and both process are
>>>>>>>>> joined for each test file. This will yield suboptimal
>>>>>>>>> system utilization even for a small three-core system
>>>>>>>>> like mine.
>>>>>>>>> 
>>>>>>>>> Using the "-t/--threads N" option in the patched 
>>>>>>>>> poppler-regtest will spawn N worker threads that 
>>>>>>>>> handle all tests they can get from a single queue
>>>>>>>>> for all known tests, allowing to heavily utilize
>>>>>>>>> also large system if using a large set of test cases.
>>>>>>>>> But even for my three-core system, this brought down
>>>>>>>>> the time to create references for the complete test
>>>>>>>>> suite using the Splash backend from 4,5 hours to
>>>>>>>>> 2,75 hours.
>>>>>>>>>> What do You mean with "patched poppler-regtest"? 
>>>>>>>>>> There was no attachment, or do I miss something?
>>>>>>>> No, this on the process level in the test driver and 
>>>>>>>> independent of the your multi-threading work. There
>>>>>>>> was a patch called "parallel_testing.patch" attached,
>>>>>>>> as least my mail client tells me so.
>>>>>>>> 
>>>>>>>> I'll just try again...
>>>>>>>> 
>>>>>>>> Best regards, Adam.
>>>>>>>> 
>>>>>>>>>> If You talk about my multi-threaded testcase, 
>>>>>>>>>> please be aware, that it is still experimental, 
>>>>>>>>>> even if quite close. Here the result of my last 
>>>>>>>>>> regression test last sunday evening: Total 1133 
>>>>>>>>>> tests 1114 tests passed (98.32%) 17 tests failed 
>>>>>>>>>> (1.50%): /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>> Suite/Algorithmics - The Spirit of Computing, 3rd 
>>>>>>>>>> Ed.pdf (splash), /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>> Suite/Essentials of English Grammar - 
>>>>>>>>>> www.ielts4u.blogfa.com.pdf ... 2 tests crashed 
>>>>>>>>>> (0.18%): /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>> Suite/bug157090.pdf (splash), 
>>>>>>>>>> /media/thomas/HD-PCTU3/PDF Suite/sinatr4c.f5.pdf 
>>>>>>>>>> (splash) I'll first have a look at the crashes
>>>>>>>>>> next weekend, then I'll continue with looking at
>>>>>>>>>> the failed tests.... Cheers, Thomas
>>>>>>>>> 
>>>>>>>>> IMHO, the necessary changes seem quite small 
>>>>>>>>> especially since a lot of them are connected to 
>>>>>>>>> indentation handling. What are your thoughts on the 
>>>>>>>>> utility and implementation of this?
>>>>>>>>> 
>>>>>>>>> Best regards, Adam.
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________ 
>>>>>>>>>> poppler mailing list poppler at lists.freedesktop.org
>>>>>>>>>> 
>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 
> _______________________________________________ poppler
>>>>>>>>> mailing list poppler at lists.freedesktop.org 
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 
> _______________________________________________ poppler
>>>>>>>>> mailing list poppler at lists.freedesktop.org 
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> 
> _______________________________________________ poppler mailing
>>>>> list poppler at lists.freedesktop.org 
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>> 
>>>>> 
>>> 
>>> 
>>> _______________________________________________ poppler mailing 
>>> list poppler at lists.freedesktop.org 
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.19 (GNU/Linux)
>
> iQEcBAEBAgAGBQJQ3vyQAAoJEPSSjE3STU34/9kH/iOG+N6WLsZKKKYeKtxEOtda
> TMT1s6SsiWCEQIHIZNMoJvpHAO75mDXU3M+/QXuYre6Bhn+1+PlsVI/S74wus9HI
> 8jJC0sDOBYvM3PrEng+kjMsff/iOH6hsMrPbL/R/qAB4qjdCQSeAvloYu4TaSeB4
> 6dyqoZ05qxUt5UMEtVqDQLGHMllL2fwOMy3//elpzCiDArR/okv51p9r+qiEbQAe
> 7a+xpRplFSbQs0AvNqlE1GFFGVRusckzH6aXLng//DiCbJ+8iyGqZ5ALqZyk/HeH
> fRb4e+68siIRf+NvpUktNJCfmvjqfF47yaW+e3JAxTm848sHtmEqHqeEyT/wL98=
> =CKSU
> -----END PGP SIGNATURE-----
> From 5e7461569e3085370326203539fa02de968eb2ca Mon Sep 17 00:00:00 2001
> From: Adam Reichold <adamreichold at myopera.com>
> Date: Sat, 29 Dec 2012 15:17:33 +0100
> Subject: [PATCH] print messages that are useful with interleaved output by
>  several worker threads
>
> ---
>  regtest/TestRun.py | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/regtest/TestRun.py b/regtest/TestRun.py
> index 5564757..750b93c 100644
> --- a/regtest/TestRun.py
> +++ b/regtest/TestRun.py
> @@ -71,7 +71,7 @@ class TestRun:
>          with self._lock:
>              self._n_tests += 1
>  
> -        self.printer.print_test_start("Testing '%s' using %s backend (%d/%d): " % (doc_path, backend.get_name(), n_doc, total_docs))
> +        test_msg = "Testing '%s' using %s backend (%d/%d): " % (doc_path, backend.get_name(), n_doc, total_docs)
>          test_has_md5 = backend.create_refs(doc_path, test_path)
>  
>          if backend.has_stderr(test_path):
> @@ -81,24 +81,24 @@ class TestRun:
>          if ref_has_md5 and test_has_md5:
>              if backend.compare_checksums(refs_path, test_path, not self.config.keep_results, self.config.create_diffs, self.config.update_refs):
>                  # FIXME: remove dir if it's empty?
> -                self.printer.print_test_result("PASS")
> +                self.printer.printout(test_msg + "PASS")
>                  with self._lock:
>                      self._n_passed += 1
>              else:
> -                self.printer.print_test_result_ln("FAIL")
> +                self.printer.printout_ln(test_msg + "FAIL")
>                  with self._lock:
>                      self._failed.append("%s (%s)" % (doc_path, backend.get_name()))
>              return
>          elif test_has_md5:
>              if ref_is_crashed:
> -                self.printer.print_test_result_ln("DOES NOT CRASH")
> +                self.printer.printout_ln(test_msg + "DOES NOT CRASH")
>              elif ref_is_failed:
> -                self.printer.print_test_result_ln("DOES NOT FAIL")
> +                self.printer.printout_ln(test_msg + "DOES NOT FAIL")
>              return
>  
>          test_is_crashed = backend.is_crashed(test_path)
>          if ref_is_crashed and test_is_crashed:
> -            self.printer.print_test_result("PASS (Expected crash)")
> +            self.printer.printout_ln(test_msg + "PASS (Expected crash)")
>              with self._lock:
>                  self._n_passed += 1
>              return
> @@ -106,19 +106,19 @@ class TestRun:
>          test_is_failed = backend.is_failed(test_path)
>          if ref_is_failed and test_is_failed:
>              # FIXME: compare status errors
> -            self.printer.print_test_result("PASS (Expected fail with status error %d)" % (test_is_failed))
> +            self.printer.printout(test_msg + "PASS (Expected fail with status error %d)" % (test_is_failed))
>              with self._lock:
>                  self._n_passed += 1
>              return
>  
>          if test_is_crashed:
> -            self.printer.print_test_result_ln("CRASH")
> +            self.printer.printout_ln(test_msg + "CRASH")
>              with self._lock:
>                  self._crashed.append("%s (%s)" % (doc_path, backend.get_name()))
>              return
>  
>          if test_is_failed:
> -            self.printer.print_test_result_ln("FAIL (status error %d)" % (test_is_failed))
> +            self.printer.printout_ln(test_msg + "FAIL (status error %d)" % (test_is_failed))
>              with self._lock:
>                  self._failed_status_error("%s (%s)" % (doc_path, backend.get_name()))
>              return
> -- 
> 1.8.0.3
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20121229/451fc14c/attachment-0001.pgp>


More information about the poppler mailing list