[poppler] Getting a raster image like pdf2ppm

Angus March angus at uducat.com
Fri May 22 12:45:20 PDT 2009


Albert Astals Cid wrote:
> A Dijous, 21 de maig de 2009, Angus March va escriure:
>   
>> Albert Astals Cid wrote:
>>     
>>> A Dijous, 21 de maig de 2009, Angus March va escriure:
>>>       
>>>> Albert Astals Cid wrote:
>>>>         
>>>>> A Dimecres, 20 de maig de 2009, Angus March va escriure:
>>>>>           
>>>>>> Albert Astals Cid wrote:
>>>>>>             
>>>>>>> A Dimarts, 19 de maig de 2009, Angus March va escriure:
>>>>>>>               
>>>>>>>> Adrian Johnson wrote:
>>>>>>>>                 
>>>>>>>>> Angus March wrote:
>>>>>>>>>                   
>>>>>>>>>> I tried using Poppler to get a Cairo surface and then saving the
>>>>>>>>>> surface to a PNG. Unfortunately, the resulting image was of
>>>>>>>>>> disastrously low quality.
>>>>>>>>>>                     
>>>>>>>>> Without seeing your code or the output you are getting I can only
>>>>>>>>> guess at what the problem might be. Did you alter the cairo scale
>>>>>>>>> to get the desired image dpi?
>>>>>>>>>                   
>>>>>>>>     It was definitely an improvement, but I think the only thing
>>>>>>>> that did improve was the resolution. The old problems that caused me
>>>>>>>> to abandon Cairo persisted, which are: gradients have ugly stripes
>>>>>>>> on them, a background that should be white and opaque is black and
>>>>>>>> transparent, and some text that has a shadow in the PDF doesn't in
>>>>>>>> the image. I don't suppose you know of a way to deal w/those
>>>>>>>> problems.
>>>>>>>>                 
>>>>>>> ?
>>>>>>>
>>>>>>> I don't see anything obviously wrong.
>>>>>>>
>>>>>>> Basically it is:
>>>>>>>  * Create PDFDoc
>>>>>>>  * Create SplashOutputDev
>>>>>>>  * Call SplashOutputDev::startDoc
>>>>>>>  * Call PDFDoc::displayPageSlice
>>>>>>>               
>>>>>>     Well there definitely is something wrong, because it works with
>>>>>> pdftoppm. I thought of things like the __attribute__((constructor))
>>>>>> attribute, or static objects, but I don't see any evidence of the
>>>>>> attribute and I wouldn't know how to find a static object in all that
>>>>>> code. Maybe multiple processes causes problems for Splash.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It's hard to know where to go.
>>>>>>             
>>>>> The crashes you pasted are from poppler compiled with -O2? If so remove
>>>>> the - O2 and substitute -g by -g3. Optimized poppler backtraces are
>>>>> really misleading.
>>>>>           
>>>>     I figured out a way to get my app to build from the poppler lib I
>>>> rolled myself (although I'd still like to know what the proper procedure
>>>> is to get it to build in debug, and install the Splash stuff) and I got
>>>> some valgrind reports that might be more helpful, but are fewer than
>>>> those I got when I was using the SUSE distro's lib:
>>>>
>>>> ==8577== Conditional jump or move depends on uninitialised value(s)
>>>> ==8577==    at 0x53DACE4: FoFiType1C::parse() (FoFiType1C.cc:1848)
>>>> ==8577==    by 0x53E10AB: FoFiType1C::make(char*, int)
>>>> (FoFiType1C.cc:35) ==8577==    by 0x5369A58:
>>>> Gfx8BitFont::Gfx8BitFont(XRef*, char*, Ref, GooString*, GfxFontType,
>>>> Dict*) (GfxFont.cc:699)
>>>> ==8577==    by 0x536D72C: GfxFont::makeFont(XRef*, char*, Ref, Dict*)
>>>> (GfxFont.cc:143)
>>>> ==8577==    by 0x536D933: GfxFontDict::GfxFontDict(XRef*, Ref*, Dict*)
>>>> (GfxFont.cc:2051)
>>>> ==8577==    by 0x535AD21: GfxResources::GfxResources(XRef*, Dict*,
>>>> GfxResources*) (Gfx.cc:313)
>>>> ==8577==    by 0x535DD6B: Gfx::Gfx(XRef*, OutputDev*, int, Dict*,
>>>> Catalog*, double, double, PDFRectangle*, PDFRectangle*, int, int
>>>> (*)(void*), void*) (Gfx.cc:502)
>>>> ==8577==    by 0x539AF12: Page::createGfx(OutputDev*, double, double,
>>>> int, int, int, int, int, int, int, int, Catalog*, int (*)(void*), void*,
>>>> int (*)(Annot*, void*), void*) (Page.cc:404)
>>>> ==8577==    by 0x539B173: Page::displaySlice(OutputDev*, double, double,
>>>> int, int, int, int, int, int, int, int, Catalog*, int (*)(void*), void*,
>>>> int (*)(Annot*, void*), void*) (Page.cc:433)
>>>> ==8577==    by 0x40A756: pdf2jpg::GetSplash(int) (pdf2jpg.cpp:176)
>>>> ==8577==    by 0x40A9B5: pdf2jpg::TopupJpegThreads(int, astring const&)
>>>> (pdf2jpg.cpp:156)
>>>> ==8577==    by 0x40B3B1: pdf2jpg::Execute(int, char const*, char const*,
>>>> int) (pdf2jpg.cpp:99)
>>>> ==8577==
>>>>         
>>> Are you positively sure this doesn't happen with pdftoppm? Doesn't make
>>> any sense.
>>>       
>>     It doesn't seem to be. I'll try running valgrind on the debug
>> version of pdftoppm that I have here, and see what that does...
>>     Well she hasn't reported any problems so far. I'll see tomorrow
>> morning, then I guess I'll know for sure.
>>     Also, I keep forgetting to point out that another problem my app has
>> is with Splash getting stuck in an infinite loop every so often,
>> requiring a kill -9.
>>     How about this: I send you a sample of something that causes the
>> problems. Compile this and run it through valgrind. It came across a few
>> problems in a short time. BTW, for the sake of simplicity, it doesn't
>> actually output any files. It just gets the raw image data from Splash.
>>     
>
> I see the problem, but i also see that 
>
> pages=`/home/tsdgeos/cvs/poppler/build-new/utils/pdfinfo "$filename" | grep 
> Pages: | cut -c 1,2,3,4,5,6 --complement`
> for index in $(seq 1 $pages); do
>   echo -n "$index "
>   /home/tsdgeos/cvs/poppler/build/utils/pdftoppm -f $index -l $index 
> "$filename" old/foo$index &
> done
> wait
>
> Does not have this problem, so there must be something that does not get 
> detached on fork?
>   
    I'm afraid I didn't understand any of that. What do you mean you see
the problem? What is it? And what does that script mean? What's this
about detaching? Are you saying that forking is the problem?
    It could well be that forking is the problem. Fortunately, I don't
need to fork at that point between the PDFDoc and the Splash instances.
Unfortunately, I do need to fork at some point, since this app is a
daemon. But, I keep the instantiation of PDFDoc and Splash in the same
process. I've been running delete_this with the following change to main:
int main(int argc, char *argv[]) {
    assert(argc == 2);
    bool bWorker = fork() == 0;
    if (!bWorker) bWorker = fork() == 0;
    if (bWorker) {
        pdf2jpg thing;
        thing.Execute(argv[1], "page", 1324);
    }
    else {
        verify(::wait(NULL) > 0);
        verify(::wait(NULL) > 0);
    }

  return EXIT_SUCCESS;
}

This way there'll be 2 processing running, but they will have their own
PDFDocs. I've been running it now for a while, and valgrind hasn't had a
single issue w/it yet.
    Why Splash could have an issue w/this is hard to imagine, although
at this point, I don't really care. Two different processes, running on
different computers, in different countries even, can affect each other.
But what Splash might be using that wouldn't be separated by a fork, I
can only guess. Maybe a pipe, or socket. All that matters is that I can
do it. Thanks for your input.


More information about the poppler mailing list