[poppler] Fixing Offset of Text Vertical Positioning in pfdtohtml

Albert Astals Cid aacid at kde.org
Tue Sep 6 09:35:12 PDT 2011

A Dimecres, 31 d'agost de 2011, Stephen Reichling vàreu escriure:
> Hi all,


> Josh and I continue to work on improving the PDF to HTML conversion in
> poppler, but I am running into one persistent bug that I am having trouble
> solving and would like to solicit the list's help on.
> The quick background is that text in the HTML is often slightly vertically
> offset from where it appears in the PDF. The degree to which it is offset
> and whether it is offset upwards or downwards varies depending on the font
> applied to the text and some fonts are not offset at all. My analysis
> indicates that the problem has to do with the ascent/descent metrics of the
> font being used. In PDF, text is vertically positioned by specifying the
> position of the text's baseline. However, absolutely positioned spans in
> HTML must have a top or bottom y-coordinate, baseline is not an available
> positioning option. So, to convert the y-coordinate from the PDF into
> something usable in HTML, poppler is currently subtracting from it the
> ascent of the current font multiplied by the height in pixels of the
> current font, which produces the behavior described above.
> My attempts to improve this positioning adjustment have been stymied by the
> inconsistent and sometimes conflicting information I have found about font
> metrics. So, if anyone with a good understanding of fonts can help answer
> the following questions, I would appreciate it:
> 1.       What is the relationship between the ascent/descent of a font and
> the units per em (UPM) of a font? 

Not an expert in fonts, but i'd say there is no relation at all
> 2.       Which of the several ascent/descent values describing a font is the
> correct one to pay attention to? In examining various font file types, I
> have often found that the different of tables in a font have different,
> conflicting ascent/descent values. For example, in a TrueType font, there
> is one set of ascender/descender values in the "hhea" table while in the
> "os/2" there are two more sets of values, one known as the
> typoasecender/typodescender and the other as the winascent/windescent. When
> a browser is positioning a font, does it just look at one of these or is
> there a more complicated relationship between them that I don't understand?
> Furthermore, it is often the case that beyond the ascent/descent
> information included in the embedded font file itself, yet another and
> often different set of values will be included in that font's font
> descriptor in the PDF. Which should I pay attention to then?

Again not an expert in fonts, but i can tell you that in poppler itself we 
seem to ignore Ascent/Descent values when rendering (only TextOutputDev seems 
to use them). What i have learnt while dealing with fonts is that usually you 
need to render them to get the "real" values since there is the hinting and 
lots of stuff involved that might change the result a lot, but as I said I am 
far from a fonts expert.

> Thanks in advance for your advice!
> -Stephen Reichling

More information about the poppler mailing list