[poppler] annot->getContents() gives garbled text with special characters

gregor.hochschild at gmx.de gregor.hochschild at gmx.de
Sun Jan 22 08:28:42 PST 2012


I am trying to extract note annotations from a pdf. It works fine as long as there are no special characters. An note that contain "Hi my name is greg", for example, works without problems. "Hallo  göb das äch würklech geyt?", however, fails. I am not sure what the problem is and would be grateful for any help. Below is some code. I am happy to provide more info or an example pdf if necessary. 


Here is some code:

//define variables
std::string content("");

// get annotation (doc is PDFDoc object created with createPDFDoc)
Page *currentPage= doc->getPage(page);
Annots *annots=currentPage->getAnnots(doc->getCatalog());   
Annot *annot=annots->getAnnot(0);

// get content of annotation
if(annot->getContents()!=0) content=annot->getContents()->getCString();

// output (f is an open FILE object with "w")
printf("%s",  content.c_str());
fprintf (f,"%s",  content.c_str());

// printf fails completely for annotations with special characters
// fprintf put garbled text in the txt file
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

More information about the poppler mailing list