[poppler] Testing Re: Multicolumn select

Baz brian.ewins at gmail.com
Wed Dec 9 06:51:59 PST 2009


2009/12/8 Albert Astals Cid <aacid at kde.org>:
> What we want is something that makes text extraction/selection better, the
> definition of better is the problem here :D

Ok. So it sounds like it would be worth adding tests in, so we can be
explicit about what we want text extraction to do.

I could do this in two ways:
- write a test harness that calls the apis directly (following the
example of cairo). This has the advantage that more apis could be
tested later, but complicates writing the tests; and in any case most
other tests will be about rendering not text extraction. Since this
would be a unit test, its also fragile to API changes.
- extend pdftotext to allow me to specify start and end points for
text extraction (page,x,y). This would make writing tests easy - just
simple shell scripts along the lines of the git test suite. This
feature could be useful to end users too, I guess.

I like the second plan better, since it supports building ad-hoc tests
with pdfs attached to bugs. Since we already have -f and -l, (and -x,
-y do something unrelated to the selection) I'm thinking of int args
-fx, -fy, -lx, -ly, which default to (0,0) (pageWidth, pageHeight).

Does this sound useful to you?

-Baz


More information about the poppler mailing list