[poppler] Chinese characters broken

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Mon Aug 23 22:27:46 PDT 2010


# previous post is refused by mailing list service
# due to huge attached file. I compressed the font
# by bzip2. If you got this mail twice, please discard
# previous one.

Oops, I've slipped my memory. The name of well known
Chinese font causing broken shape without hinter is
MingLiU, not SimSun! I have to apologize the font
manufactures of SimSun.

Cobra, I think, enabling hinter for all CJKV fonts by
default is slightly overkill. In addition, it is not
easy to detect the character set of the embedded TrueType.

Albert proposed to update the blacklist of the TrueType
in FreeType2, and let FreeType2 switch the hinter
(hinting is enabled/disabled per font) automatically.
It sounds the most compact and portable idea, but I
think it is not perfect solution (see following part
of this message).

If Okular developers don't want to enable the hinter
by default, I propose to add a switch to enable/disable
the hinter.

Also, as the first step for Evince (and other GNOME
applications using poppler-glib), I propose to add new
feature to switch the hinter to CairoOutputDev, aslike
SplashOutputDev class has.

-------------------------------------------------------

In FreeType before release 2.4.x, the genuine TrueType
hinter was disabled by default to prevent unexpected
usage of patented technology. 

Some TrueType fonts show poor quality at low resolution
device without hinting, so FreeType developers had
implemented so-called "autohint" to improve the quality
of low resolution without patented technology.

However, the rasterization quality of autohint is arguable.
Some people prefer autohinted, others prefer antialias
without autohint, and a few people prefer embedded bitmap.
Such argument is not only for Latin alphabets. In Japan,
the quality of autohint for Japanese script is arguable.

# In fact, autohint mechanism have different implementation
# for each character set. You can find the name of afcjk,
# aflatin, afindic in freetype-2.x.y/src/autofit/.

This was not end. Some TrueType fonts using hinting heavily,
their rasterization results without TrueType hinting are
unreadably broken, even at higher resolution.
FreeType developers improved autohint to provide readable
glyph shape from such font without patented technology.

As some people don't want to use autohint by default,
FreeType has a hardwired blacklist of the "tricky" font to
enable autohint forcibly. The body of the blacklist is
tt_check_trickyness() in freetype-2.x.y/src/truetype/ttobjs.c.

http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/truetype/ttobjs.c#n148

It is very simple, it compares the name of face with builtin
blacklist. This private function is called by FT_New_Face()
or FT_Open_Face(). When the font is known to be blacklisted,
the autohint is disabled as far as FT_LOAD_NO_AUTOHINT is not
given. After the creation of FT_Face object, the FreeType2
clients can check if the font is blacklisted, by
FT_IS_TRICKY( FT_Face face ).

--

This is the story before FreeType 2.4.x. After FreeType 2.4.x,
considering that the patents required for TrueType hinting are
expired, FreeType 2.4.x enables TrueType hinter by default.
So, it might be arguable for FreeType2 client developers
that genuine TrueType hinter is still worse than no-hinting
result.

-------------------------------------------------------------

Then, what should I do? The most compact solution would be
force-hinting in FreeType2 layer, but there is a difficulty.

The embedded TrueType (Type42 or Type11) in PDF can lack 
the name table, and FreeType2 cannot detect its family name.
Therefore, this blacklist does not work.

Attached font object is extracted from dell440.pdf,
which shows broken shape if rasterized without hinter.
If you execute ftdump (included in freetype2-demos),
you can find this font has no "name" table, and FreeType2
cannot know the family name, and the blacklist scheme
does not work.

# BTW, it seems that the PDF generator does not
# calculate TrueType table checksum by itself,
# it copies if the table is not subsetted
# and fill 0 if the table is subsetted.
#
#         tag cvt  offset 0x0000009c length 0x000002e4 checkSum 0x05bcf058
#         tag fpgm offset 0x00000380 length 0x000087c4 checkSum 0x28233bf1
#         tag hhea offset 0x00008b44 length 0x00000024 checkSum 0x07be58bc
#         tag maxp offset 0x00008b68 length 0x00000020 checkSum 0x6fa88e54
#         tag prep offset 0x00008b88 length 0x000001e1 checkSum 0xa344a1eb
#         tag glyf offset 0x00008d6c length 0x00009716 checkSum 0x00000000
#         tag loca offset 0x00012484 length 0x0000b1c4 checkSum 0x00000000
#         tag hmtx offset 0x0001d648 length 0x0000b038 checkSum 0x00000000
#         tag head offset 0x00028680 length 0x00000036 checkSum 0xb8417ef6
#
# Oh my god.

The font name is written in the PDF document, but not
available in TrueType font. So PDF renderer should decide
the hinter should be enabled or not? To do such, my idea
would be: exposing FreeType2 blacklist function to public,
and FreeType2 client check if the font is blacklisted
by giving family name to the function, and if it is
blacklisted, FreeType2 client enable the hinter for the font
(not for document).

But this is not perfect soluion. PDF can put a name to
the font object, which can differ from original family name
of the TrueType font. For example, even if the embedded
TrueType data (sfnt string array in Type42 font) is taken
from "TimesNewRoman", we can give the name of Type42 font
object as "CourierNew". In addition, other font object
referring this Type42 font object can be named as "Arial".

In the case of dell440.pdf, most fonts are named with
ASCII fontname, but problematic font is named with
Big5 fontname (maybe). Unfortunately, FreeType2 black
list of tricky fonts are not fully internationalized.

The referring chain is following:

The composite font object:

<<
	/BaseFont	/LNQGRD+#b7s#b2#d3#a9#fa#c5#e9-WinCharSetFFFF-H
	/DescendantFonts	[34 0 R]
	/Subtype	/Type0
	/ToUnicode	39 0 R
	/Type		/Font
	/Encoding	/Identity-H
>>

The CID-keyed component font referred by the composite font:

<<
	/BaseFont	/LNQGRD+#b7s#b2#d3#a9#fa#c5#e9
	/DW		1000
	/CIDSystemInfo	35 0 R
	/Subtype	/CIDFontType2
	/FontDescriptor	36 0 R
	/Type		/Font
>>

The font descripter referred by the CID-keyed component font:

<<
	/FontName	/LNQGRD+#b7s#b2#d3#a9#fa#c5#e9
	/StemV		150
	/FontFile2	37 0 R
	/Ascent		800
	/Flags		5
	/AvgWidth	1000
	/Descent	-194
	/ItalicAngle	0
	/MaxWidth	1000
	/MissingWidth	1000
	/CIDSet		38 0 R
	/FontBBox	[0 -194 1000 800]
	/Type		/FontDescriptor
	/CapHeight	800
>>

The FontFile2 stream is attached TrueType font.

# I uncompressed dell440.pdf by pdftk, so the
# object number may differ from original PDF.

I think, this chain reference is tracked by the object
numbers, so using different fontnames in each objects
might be possible. Checking each font names if it is
blacklisted might be troublesome work (if anybody want
to see, I will draft).

------------------------------------------------------------

In summary, I think, auto-switching of the hinter
in PDF rendering system requires long work. So,
as the first step, the introduction of new method
to enable/disable the hinting per document is
good starting point.

Regards,
mpsuzuki



On Tue, 24 Aug 2010 09:54:02 +0800 (CST)
"cobra.yu" <cobra.yu at hyweb.com.tw> wrote:
>Basically, autohinting should be used on CJKV fonts only
>(maybe some others apply too), but not on regular ISO-8859-1
>or similar encoding systems. Before turning on this function,
>users should check the encoding system first.
>
>        Cobra
>-----Original message-----
>From:Albert Astals Cid <aacid at kde.org>
>To:poppler at lists.freedesktop.org
>Date:Mon, 23 Aug 2010 23:21:23 +0100
>Subject:Re: [poppler] Chinese characters broken
>
>A Dilluns, 23 d'agost de 2010, suzuki toshiya va escriure:
>> Thanks Cobra.
>> 
>> To add a few words, setRenderHint() method is for poppler-qt4 binding.
>> The insertion of the method should be proposed to Okular maintainer.
>
>Okular maintainer speaking: This is set to false on purpose because lots of 
>people came in giving shit that non hinted fonts looked better. One can't make 
>everyone happy.
>
>Albert
>
>> 
>> I'm not sure similar methods exist in other binding. Evince seems to
>> use poppler-glib binding, and its default backend is CairoOutputDev.
>> Because Cairo renders the TrueType font internally, so more detailed
>> code check is needed.
>> 
>> Regards,
>> mpsuzuki
>> 
>> cobra.yu wrote:
>> > Hi,
>> > 
>> >      The "hinting" issue has been fixed by Freetype 2 w/o patent
>> >      troubles, but Poppler doesn't turn on this function by default.
>> >      I've used
>> > 
>> > "doc->setRenderHint(Poppler::Document::TextHinting);" in my application
>> > and fixed this problem.
>> > 
>> >             Cobra
>> > 
>> > -----Original message-----
>> > From:ddreamer<ddreamer at ms93.url.com.tw>
>> > To:mpsuzuki<mpsuzuki at hiroshima-u.ac.jp>
>> > Cc:poppler<poppler at lists.freedesktop.org>
>> > Date: Sun, 22 Aug 2010 22:35:47 +0800 (CST)
>> > Subject: Re: [poppler] Chinese characters broken
>> > The screenshot1.png is normal result while Screenshot2.png is abnormal
>> > result generated by Okular and Evince
>> > _______________________________________________
>> > poppler mailing list
>> > poppler at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/poppler
>> 
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dell440_037-000.ttf.bz2
Type: application/octet-stream
Size: 37460 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100824/af4f1fc7/attachment-0001.obj>


More information about the poppler mailing list