[poppler] RFC: patch: parallel testing

Adam Reichold adamreichold at myopera.com
Sat Dec 29 11:13:02 PST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Carlos,

Am 29.12.2012 19:48, schrieb Carlos Garcia Campos:
> Adam Reichold <adamreichold at myopera.com> writes:
> 
> Hello Carlos,
> 
> I think the problem also happens with a tty but it is not
> immediately obvious: The output will look fine, but e.g. the "FAIL"
> appended to the current line might belong to a different test case.
> Consider the following chain of events (serialized by the Printer
> lock) using two worker threads:
> 
> thread A: self.printer.print_test_start("Test A: ") thread B:
> self.printer.print_test_start("Test B: ") thread A:
> self.printer.print_test_result_ln("FAIL")
> 
> which as I understand it should result in
> 
> "Test B: FAIL"
> 
> whereas it was test A that actually failed.
> 
>> Ok, I've pushed a different version. The number of the test is
>> not that important in the end, so I think we could assign the
>> numbers when printing the results to ensure we keep the order.
> 
>> Another issue I've noticed with the threads is that
>> poppler-regtest can't be kille with CTRL + C anymore.

Yes, the reason is that Queue.join() is a blocking call so that the
main thread can't handle the KeyboardInterrupt. There are several ways
to work around this, but I simply opted to print the PID before
spawning the worker threads to use with "kill". (Depending on the
shell used, one can suspend the task and use "kill %%".) More
complicated variants could poll the queue for unprocessed jobs and
handle the KeyboardInterrupt by clearing it.

Best regards, Adam.

> Best regards, Adam.
> 
>> Thanks,
> 
> Am 29.12.2012 17:13, schrieb Carlos Garcia Campos:
>>>> Adam Reichold <adamreichold at myopera.com> writes:
>>>> 
>>>> Hello Thomas,
>>>> 
>>>> It is not a very sophisticated solution, but I think the
>>>> attached patch should ensure useful messages for interleaved
>>>> output of several worker threads.
>>>> 
>>>>> I hadn't noticed this problem, and I can only reproduce it
>>>>> when running the tests in verbose mode, does it only happen
>>>>> when stdout is not a tty?
>>>> 
>>>> Best regards, Adam.
>>>> 
>>>> Am 29.12.2012 10:32, schrieb Thomas Freitag:
>>>>>>> Hi Adam!
>>>>>>> 
>>>>>>> I used yesterday the first time Your patch for
>>>>>>> regtesting: Awfully, congratulations. You probably
>>>>>>> already encountered my comment on bug 50992. One minor
>>>>>>> thing: Unfortunately the protocol is not really
>>>>>>> readable anymore, because each thread i.e. first
>>>>>>> outputs
>>>>>>> 
>>>>>>> Testing '/media/thomas/HD-PCTU3/PDF
>>>>>>> Suite/01_ump_a_2009.pdf' using splash backend
>>>>>>> (9/1137):
>>>>>>> 
>>>>>>> and when it finishes i.e.
>>>>>>> 
>>>>>>> PASS
>>>>>>> 
>>>>>>> Because of the worker threads it's now nearly
>>>>>>> impossible to match the test itself with the result of
>>>>>>> the test. Can You change it in that way that this is
>>>>>>> outputted together? Should be quite easy....
>>>>>>> 
>>>>>>> Cheers, Thomas
>>>>>>> 
>>>>>>> Am 06.12.2012 19:02, schrieb Adam Reichold: Hello
>>>>>>> again,
>>>>>>> 
>>>>>>> I reformatted the patch hopefully solving the
>>>>>>> whitespace issues. (It does apply and run for me, but
>>>>>>> there are still some whitespace warnings.)
>>>>>>> 
>>>>>>> The use of recursive locks is probably not necessary in
>>>>>>>  "TestRun.py" and "TestReferences.py", but since the 
>>>>>>> performance impact is probably negligible in this case,
>>>>>>> I thought allowing for recursion would help to keep the
>>>>>>> code more accessible. In "Printer.py" there is at least
>>>>>>> one recursion from "printout_update" into "printout".
>>>>>>> 
>>>>>>> Best regards, Adam.
>>>>>>> 
>>>>>>> Am 06.12.2012 18:19, schrieb Adam Reichold:
>>>>>>>>>> Hello Carlos,
>>>>>>>>>> 
>>>>>>>>>> Am 06.12.2012 18:11, schrieb Carlos Garcia
>>>>>>>>>> Campos:
>>>>>>>>>>> Adam Reichold <adamreichold at myopera.com>
>>>>>>>>>>> writes:
>>>>>>>>>>>> Hello again,
>>>>>>>>>>>> 
>>>>>>>>>>>> As suggested by William Bader, I added the
>>>>>>>>>>>> option to use the logical CPU count as the
>>>>>>>>>>>> number of worker threads automatically.
>>>>>>>>>>> Thanks for the patch, any improvement in the 
>>>>>>>>>>> regression test framework is more than welcome,
>>>>>>>>>>> even more if it's to improve the performance.
>>>>>>>>>>> What poppler have you sued to create the
>>>>>>>>>>> patches? They don't apply for me here using
>>>>>>>>>>> poppler from current git master.
>>>>>>>>>> Yes, it is about performance and I am hesitant
>>>>>>>>>> about introducing such complexity into the test
>>>>>>>>>> driver which is supposed to test the correctness
>>>>>>>>>> of program itself, but faster regression testing
>>>>>>>>>> probably means more regression testing... :-)
>>>>>>>>>> 
>>>>>>>>>> I am not sure how to correctly format the patch.
>>>>>>>>>> I built on the current master but it seems Git
>>>>>>>>>> thinks that the lines where I just changed the
>>>>>>>>>> indentation are whitespace errors. I am not sure
>>>>>>>>>> how to correct this as changes in whitespace are
>>>>>>>>>> obviously important for Python scripts... I'll
>>>>>>>>>> try to research this, but any advice would be
>>>>>>>>>> welcome.
>>>>>>>>>> 
>>>>>>>>>> Best regards, Adam.
>>>>>>>>>> 
>>>>>>>>>>>> Regards, Adam.
>>>>>>>>>>>> 
>>>>>>>>>>>> Am 04.12.2012 12:28, schrieb Adam Reichold:
>>>>>>>>>>>>> Hello Thomas,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Am 04.12.2012 08:41, schrieb Thomas
>>>>>>>>>>>>> Freitag:
>>>>>>>>>>>>>> Am 04.12.2012 07:45, schrieb Adam
>>>>>>>>>>>>>> Reichold: Hello everyone,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I currently try to get myself acquainted
>>>>>>>>>>>>>> with Poppler's regression testing
>>>>>>>>>>>>>> framework. Because my system has a rather
>>>>>>>>>>>>>> low single-threaded performance, I tried
>>>>>>>>>>>>>> to implement parallel testing using
>>>>>>>>>>>>>> Python's Queue class.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Even though poppler-regtest currently
>>>>>>>>>>>>>> uses two processes per test file,
>>>>>>>>>>>>>> rendering even and odd pages
>>>>>>>>>>>>>> respectively, the test files themselves
>>>>>>>>>>>>>> are still handled sequentially and both
>>>>>>>>>>>>>> process are joined for each test file. 
>>>>>>>>>>>>>> This will yield suboptimal system
>>>>>>>>>>>>>> utilization even for a small three-core
>>>>>>>>>>>>>> system like mine.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Using the "-t/--threads N" option in the 
>>>>>>>>>>>>>> patched poppler-regtest will spawn N
>>>>>>>>>>>>>> worker threads that handle all tests they
>>>>>>>>>>>>>> can get from a single queue for all known
>>>>>>>>>>>>>> tests, allowing to heavily utilize also
>>>>>>>>>>>>>> large system if using a large set of test
>>>>>>>>>>>>>> cases. But even for my three-core system,
>>>>>>>>>>>>>> this brought down the time to create
>>>>>>>>>>>>>> references for the complete test suite
>>>>>>>>>>>>>> using the Splash backend from 4,5 hours 
>>>>>>>>>>>>>> to 2,75 hours.
>>>>>>>>>>>>>>> What do You mean with "patched 
>>>>>>>>>>>>>>> poppler-regtest"? There was no
>>>>>>>>>>>>>>> attachment, or do I miss something?
>>>>>>>>>>>>> No, this on the process level in the test
>>>>>>>>>>>>> driver and independent of the your
>>>>>>>>>>>>> multi-threading work. There was a patch
>>>>>>>>>>>>> called "parallel_testing.patch" attached,
>>>>>>>>>>>>> as least my mail client tells me so.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll just try again...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards, Adam.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If You talk about my multi-threaded
>>>>>>>>>>>>>>> testcase, please be aware, that it is
>>>>>>>>>>>>>>> still experimental, even if quite
>>>>>>>>>>>>>>> close. Here the result of my last
>>>>>>>>>>>>>>> regression test last sunday evening:
>>>>>>>>>>>>>>> Total 1133 tests 1114 tests passed 
>>>>>>>>>>>>>>> (98.32%) 17 tests failed (1.50%): 
>>>>>>>>>>>>>>> /media/thomas/HD-PCTU3/PDF
>>>>>>>>>>>>>>> Suite/Algorithmics - The Spirit of
>>>>>>>>>>>>>>> Computing, 3rd Ed.pdf (splash),
>>>>>>>>>>>>>>> /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>>>>>>> Suite/Essentials of English Grammar - 
>>>>>>>>>>>>>>> www.ielts4u.blogfa.com.pdf ... 2 tests 
>>>>>>>>>>>>>>> crashed (0.18%):
>>>>>>>>>>>>>>> /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>>>>>>> Suite/bug157090.pdf (splash), 
>>>>>>>>>>>>>>> /media/thomas/HD-PCTU3/PDF 
>>>>>>>>>>>>>>> Suite/sinatr4c.f5.pdf (splash) I'll
>>>>>>>>>>>>>>> first have a look at the crashes next
>>>>>>>>>>>>>>> weekend, then I'll continue with
>>>>>>>>>>>>>>> looking at the failed tests.... Cheers,
>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> IMHO, the necessary changes seem quite
>>>>>>>>>>>>>> small especially since a lot of them are
>>>>>>>>>>>>>> connected to indentation handling. What
>>>>>>>>>>>>>> are your thoughts on the utility and
>>>>>>>>>>>>>> implementation of this?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best regards, Adam.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 
poppler mailing list
>>>>>>>>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>
>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>> 
_______________________________________________ poppler
>>>>>>>>>>>>>> mailing list
>>>>>>>>>>>>>> poppler at lists.freedesktop.org 
>>>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>
>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>> 
_______________________________________________ poppler
>>>>>>>>>>>>>> mailing list
>>>>>>>>>>>>>> poppler at lists.freedesktop.org 
>>>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>
>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>> 
_______________________________________________ poppler mailing
>>>>>>>>>> list poppler at lists.freedesktop.org 
>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> 
_______________________________________________ poppler
>>>>>>>> mailing list poppler at lists.freedesktop.org 
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>
>>>>>
>>>>>>>> 
- From 5e7461569e3085370326203539fa02de968eb2ca Mon Sep 17 00:00:00
>>>>> 2001 From: Adam Reichold <adamreichold at myopera.com> Date:
>>>>> Sat, 29 Dec 2012 15:17:33 +0100 Subject: [PATCH] print
>>>>> messages that are useful with interleaved output by several
>>>>> worker threads
>>>>> 
>>>>> --- regtest/TestRun.py | 18 +++++++++--------- 1 file
>>>>> changed, 9 insertions(+), 9 deletions(-)
>>>>> 
>>>>> diff --git a/regtest/TestRun.py b/regtest/TestRun.py index 
>>>>> 5564757..750b93c 100644 --- a/regtest/TestRun.py +++ 
>>>>> b/regtest/TestRun.py @@ -71,7 +71,7 @@ class TestRun: with 
>>>>> self._lock: self._n_tests += 1
>>>>> 
>>>>> -        self.printer.print_test_start("Testing '%s' using
>>>>> %s backend (%d/%d): " % (doc_path, backend.get_name(),
>>>>> n_doc, total_docs)) +        test_msg = "Testing '%s' using
>>>>> %s backend (%d/%d): " % (doc_path, backend.get_name(),
>>>>> n_doc, total_docs) test_has_md5 =
>>>>> backend.create_refs(doc_path, test_path)
>>>>> 
>>>>> if backend.has_stderr(test_path): @@ -81,24 +81,24 @@
>>>>> class TestRun: if ref_has_md5 and test_has_md5: if 
>>>>> backend.compare_checksums(refs_path, test_path, not 
>>>>> self.config.keep_results, self.config.create_diffs, 
>>>>> self.config.update_refs): # FIXME: remove dir if it's
>>>>> empty? - self.printer.print_test_result("PASS") + 
>>>>> self.printer.printout(test_msg + "PASS") with self._lock: 
>>>>> self._n_passed += 1 else: - 
>>>>> self.printer.print_test_result_ln("FAIL") + 
>>>>> self.printer.printout_ln(test_msg + "FAIL") with
>>>>> self._lock: self._failed.append("%s (%s)" % (doc_path,
>>>>> backend.get_name())) return elif test_has_md5: if
>>>>> ref_is_crashed: - self.printer.print_test_result_ln("DOES
>>>>> NOT CRASH") + self.printer.printout_ln(test_msg + "DOES NOT
>>>>> CRASH") elif ref_is_failed: - 
>>>>> self.printer.print_test_result_ln("DOES NOT FAIL") + 
>>>>> self.printer.printout_ln(test_msg + "DOES NOT FAIL")
>>>>> return
>>>>> 
>>>>> test_is_crashed = backend.is_crashed(test_path) if
>>>>> ref_is_crashed and test_is_crashed: - 
>>>>> self.printer.print_test_result("PASS (Expected crash)") + 
>>>>> self.printer.printout_ln(test_msg + "PASS (Expected
>>>>> crash)") with self._lock: self._n_passed += 1 return @@
>>>>> -106,19 +106,19 @@ class TestRun: test_is_failed =
>>>>> backend.is_failed(test_path) if ref_is_failed and
>>>>> test_is_failed: # FIXME: compare status errors -
>>>>> self.printer.print_test_result("PASS (Expected fail with
>>>>> status error %d)" % (test_is_failed)) + 
>>>>> self.printer.printout(test_msg + "PASS (Expected fail with
>>>>> status error %d)" % (test_is_failed)) with self._lock:
>>>>> self._n_passed += 1 return
>>>>> 
>>>>> if test_is_crashed: - 
>>>>> self.printer.print_test_result_ln("CRASH") + 
>>>>> self.printer.printout_ln(test_msg + "CRASH") with
>>>>> self._lock: self._crashed.append("%s (%s)" % (doc_path,
>>>>> backend.get_name())) return
>>>>> 
>>>>> if test_is_failed: - 
>>>>> self.printer.print_test_result_ln("FAIL (status error %d)"
>>>>> % (test_is_failed)) +
>>>>> self.printer.printout_ln(test_msg + "FAIL (status error
>>>>> %d)" % (test_is_failed)) with self._lock: 
>>>>> self._failed_status_error("%s (%s)" % (doc_path, 
>>>>> backend.get_name())) return -- 1.8.0.3
>>>>> 
>>>>> _______________________________________________ poppler
>>>>> mailing list poppler at lists.freedesktop.org 
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>> 
>> _______________________________________________ poppler mailing
>> list poppler at lists.freedesktop.org 
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQEcBAEBAgAGBQJQ30C+AAoJEPSSjE3STU34zZ4H/2wXadzNy60Qy9t/mqsP+/yL
2DnkY9ix96RdcG8wS1Thp985WlG9F0J0SLMXHWGLZip1FoiIn6gkMCkzHjcycKU5
aiokBX7tI0h9mLV531X2RPRWrTT9cEB+dZuLMZnzsbuKHe072R6+/gTlm8GqOk1/
dJCNLpB4JNAoFWOd2qNZ+LxI2yrrR1HUrD17VlstImd2mGykKUwRua2jfGls9ld4
KwV1UDWDT/gAmrDHm3X38HLTQBXIqhlfgP44A4Q4lOkqMGoPzsHWea6nW2cCwzWo
nBNDwVHD1wNtZC/PYMi4thjgO79beBrhhZD2/rLAtWJ422MJ6e8Gnqya0PSuyPI=
=BXwL
-----END PGP SIGNATURE-----


More information about the poppler mailing list