[poppler] [PATCH] per-collection fallback for missing CID-keyed fonts on Win32

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Mon Apr 2 02:50:16 PDT 2012


Dear Thomas,

On Mon, 02 Apr 2012 11:21:51 +0200
Thomas Freitag <Thomas.Freitag at kabelmail.de> wrote:

>Am 01.04.2012 19:59, schrieb suzuki toshiya:
>> Dear Thomas,
>>
>> I uploaded a preliminary patch for this issue to
>> 	https://bugs.freedesktop.org/show_bug.cgi?id=48046
>> The patch is to git head, a5f4936dfb3e60ca37f932cc066aa10765f3cbc9
>> Please check.

>Tested, it is okay, but You should remove the fprintf's in replaceSuffix 
>and makeWindowsFont or use the directive _DEBUG.

Thanks. Of course, the debug messages should be removed.

>> Checking the font related members of GlobalParams class, I think
>> current code doing the substitution with external fonts is confused,
>> and FontInfo class is infected (e.g. the result of "pdffonts -subst"
>> will be different from how SplashOutput substitutes the non-embedded
>> fonts). Some overhaul is needed, and I'm willing to do it after 0.20
>
>Not the fonts which are replaced by findSystemFontFile. Here hopefully 
>pdffonts -subst will show the correct names. The problems are the 
>base14-names from the GfxFont::base14FontMap. Here the behaviour 
>completely changed between before and after the merge. Before the merge 
>the member "name" from the object Font is already "mangled" by the 
>substituted name, after the merge it is still the name from the PDF and 
>the substituted name is found in "((Gfx8BitFont*)this)->base14->base14Name".

Again thanks, I have not checked the history of pdffonts.
It sounds that the behaviour before the merge is better
than the current behaviour.

>But of course, a correction is always welcome....

I think, pdffonts should notice how the fonts are fallbacked
in the rasterization (or the conversion to vector output
device), primarily. As the extra features, the indication of
the last rule choosing the final result would be helpful,
when poppler have external configuration file to control the
font substition (it's since 0.20!). 

Regards,
mpsuzuki

>> mpsuzuki at hiroshima-u.ac.jp wrote:
>>> Dear Thomas,
>>>
>>> Thank you for report the issue caused by my patch
>>> before the official release of 0.20.
>>>
>>> On Sun, 01 Apr 2012 11:32:43 +0200
>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>  wrote:
>>>>>> Albert Astals Cid wrote:
>>>>>>>> * Adobe-CNS1 (Taiwan) ->   fallback to MingLiU
>>>>>>>> * Adobe-GB1 (China mainland) ->   fallback to SimSun
>>>>>>>> * Adobe-Japan1 (Japan) ->   fallback to MS-Mincho
>>>>>>>> * Adobe-Japan2 (Japan) ->   fallback to MS-Mincho
>>>>>>>> * Adobe-Korea1 (Republic of Korea) ->   fallback to Batang
>>>>>>> Does windows ship with those fonts?
>>>>>> Yes, of course, at least, after Windows 2000.
>>>>>> * MingLiU is available since Microsoft Windows 95 for Traditional Chinese,
>>>>>> * SimSun is available since Microsoft Windows 2000 (at least).
>>>>>> * MS-Mincho is available since Microsoft Windows 3.1 for Japanese,
>>>>>> * Batang is available since Microsoft Windows 2000 (at least).
>>>>>>
>>>>>> You may want to see a list showing which versions of Microsoft Windows
>>>>>> (or which versions of Microsoft Office) ship which fonts. Me too, please
>>>>>> give me more time to check. I checked
>>>>>> 	http://www.microsoft.com/typography/fonts/family.aspx ,
>>>>>> but it does not list the history before Windows 2000.
>>>> I'm still working with Windows XP, and Your're true: when I'm looking at
>>>> the link, click on "Find fonts" and select Windows XP, the fonts should
>>>> be bundled. But if I run pdftoppm, it says:
>>>>
>>>> Syntax Error: No display font for 'MingLiU'
>>>> Syntax Error: No display font for 'SimSun'
>>>> Syntax Error: No display font for 'MS-Mincho'
>>>> Syntax Error: No display font for 'Batang'
>>>>
>>>> SimSun, MingLiU and Batang are really not in my Windows font directory,
>>> Umm. So, the http://www.microsoft.com/typography/fonts/family.aspx
>>> may describe about the summed coverages of all localized versions.
>>> When I could have a contact with Microsoft people, I will ask for
>>> extra informations about localizations.
>>>
>>>> MS-Minchu is not find because it's extension is ".ttf" and NOT ".ttc".
>>> OK, ".ttc" ->  ".ttf" (then ->  ".otf" ->  ".pfb" ?) fallback is
>>> already in my todo list, I will finish it within 24 hours.
>>>
>>> 	https://bugs.freedesktop.org/show_bug.cgi?id=48046
>>>
>>> I will post my preliminary patch to there, and, when I could
>>> make you satified, I will post the patch to this mailing list.
>>>
>>>> Would it be an idea to ignore not exsiting CJK-fonts and fall back to
>>>> ArialUnicode in this case?
>>> Of course, it is reasonably expected feature, and I have a
>>> draft of the patch to put a list of candidate fonts (may be
>>> found, or may not be found).
>>>
>>>> Sorry for the late test, was very busy in the last days,
>>> Also I have to say sorry to Albert, and I have to thank to you.
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>>
>>>> Thomas
>>>>
>>>>>> Regards,
>>>>>> mpsuzuki
>>>>>>
>>>>>> Albert Astals Cid wrote:
>>>>>>> va>
>>>>>>> escriure:
>>>>>>>> Hi all,
>>>>>>> Hi
>>>>>>>
>>>>>>>> Considering the forthcoming deadline for 0.20 feature
>>>>>>>> freeze, here I propose a small patch as a first step
>>>>>>>> to better fallback for missing CID-keyed CJK fonts.
>>>>>>>>
>>>>>>>> As I discussed with Thomas, current poppler always
>>>>>>>> tries to use a serif typeface for Japanese market
>>>>>>>> (MS-Mincho), if the user does not make special font
>>>>>>>> fallback definition table. Attached patch is a small
>>>>>>>> enhancement of Thomas's work; it checks the collection
>>>>>>>> of the missing CID-keyed font, and if it is known
>>>>>>>> Adobe collection (Adobe-CNS1, -GB1, -Japan1, -Japan2,
>>>>>>>> -Korea1), the fallback TrueType define for each collection
>>>>>>>> is used.
>>>>>>>>
>>>>>>>> * Adobe-CNS1 (Taiwan) ->   fallback to MingLiU
>>>>>>>> * Adobe-GB1 (China mainland) ->   fallback to SimSun
>>>>>>>> * Adobe-Japan1 (Japan) ->   fallback to MS-Mincho
>>>>>>>> * Adobe-Japan2 (Japan) ->   fallback to MS-Mincho
>>>>>>>> * Adobe-Korea1 (Republic of Korea) ->   fallback to Batang
>>>>>>> Does windows ship with those fonts?
>>>>>>>
>>>>>>> Albert
>>>>>>>
>>>>>>>> I'm working for further enhancement (I think missing
>>>>>>>> Sans Serif CJK typeface should be fallbacked to another
>>>>>>>> Sans Serif CJK typeface, as far as anything is available),
>>>>>>>> but the investigation of historical typeface availability
>>>>>>>> on Microsoft Windows would need some time.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> mpsuzuki
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 23 Mar 2012 22:28:27 +0100
>>>>>>>>
>>>>>>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>   wrote:
>>>>>>>>> Am 23.03.2012 14:06, schrieb mpsuzuki at hiroshima-u.ac.jp:
>>>>>>>>>> On Fri, 23 Mar 2012 08:42:01 +0100
>>>>>>>>>>
>>>>>>>>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>    wrote:
>>>>>>>>>>> Am 23.03.2012 08:21, schrieb suzuki toshiya:
>>>>>>>>>>>> Excuse me, this is the 3rd issue in your first post (add a support
>>>>>>>>>>>> to reflect cidfmap generated by ghostscript), that is not what I
>>>>>>>>>>>> care.
>>>>>>>>>>>> What I care is about hardwired MS-Mincho fallback.
>>>>>>>>>>> It's just hard wired if a CID font is expected and no appropiate
>>>>>>>>>>> substitute font is found. Propablay a better idea is to use
>>>>>>>>>>> arialuni.ttf
>>>>>>>>>>> instead of MS Mincho, but when I started with it, I only knew that MS
>>>>>>>>>>> Mincho is always installed and has some CJK chars.
>>>>>>>>>> One of the important problem in using single CJK font (e.g.
>>>>>>>>>> MS Mincho) as generic fallback is that the coverage of the
>>>>>>>>>> characters of CJK fonts are highly dependent with the assumed
>>>>>>>>>> market.
>>>>>>>>>>
>>>>>>>>>> For example, the fonts designed for China mainland, Taiwan
>>>>>>>>>> and Japan are usually missing Hangul. They should not be
>>>>>>>>>> used for Adobe-Korea1 fallback. In addition, the fonts
>>>>>>>>>> designed for Japan, Taiwan, Korea are usually missing the
>>>>>>>>>> simplified characters currently used in China mainland.
>>>>>>>>>> Also, the latest version of Adobe-GB1 includes Yi script
>>>>>>>>>> (U+A000 - U+A4BF), but the fonts for Taiwan, Japan and Korea
>>>>>>>>>> are usually missing them.
>>>>>>>>>>
>>>>>>>>>> Nothing to say, for Japanese customers, using MS Mincho or
>>>>>>>>>> MS Gothic as generic fallback would be better than using
>>>>>>>>>> SimSun (for China mainland), MingLiU (for Taiwan) or Batang
>>>>>>>>>> (for Korea) as generic fallback, but it is unfair solution.
>>>>>>>>>>
>>>>>>>>>> Using Arial Unicode as generic fallback would be neutral,
>>>>>>>>>> although its typeface quality for CJK scripts is often
>>>>>>>>>> disrespected. In addition, its vertical writing mode support
>>>>>>>>>> is insufficient.
>>>>>>>>>>
>>>>>>>>>> I attached 1 PDF and 2 pictures; one picture is fallbacked
>>>>>>>>>> by MS Mincho, another picture is fallbacked to Arial Unicode.
>>>>>>>>>>
>>>>>>>>>> Thus, I will propose a patch to prepare per-collection
>>>>>>>>>> fallback fonts (for Adobe-CNS1, Adobe-GB1, Adobe-Japan1,
>>>>>>>>>> Adobe-Japan2, Adobe-Korea1) and finally fallback to Arial
>>>>>>>>>> Unicode when no appropriate one is found.
>>>>>>>>> In my opinion: sounds great. I implemented my "poor" patch because I
>>>>>>>>> sometimes debug problems with CJK fonts under Windows and see nothing.
>>>>>>>>> So a better implementation is always welcome for me.
>>>>>>>>>
>>>>>>>>>>>>> I'm not really sure where the poppler data dir ist expected on
>>>>>>>>>>>>> MinGW,
>>>>>>>>>>>>> should be /usr/local/share/poppler, otherwise You can patch the code
>>>>>>>>>>>>> where the GlobalParam constructor is called , I do it normally under
>>>>>>>>>>>>> windows:
>>>>>>>>>>>>>
>>>>>>>>>>>>> globalParams = new
>>>>>>>>>>>>> GlobalParams("E:\\Downloads\\poppler\\poppler-data-0.4.5");
>>>>>>>>>>>>>
>>>>>>>>>>>>> and copy  cidfmap to that directory.
>>>>>>>>>>>>> If You don't do this (and only then), all CJK fonts fall back to MS
>>>>>>>>>>>>> Mincho.
>>>>>>>>>>>> Yes, it (the case without cidfmap) is what I care. I think "all CJK
>>>>>>>>>>>> fonts fall back to MS Mincho" is worse than fallback to Helvetica,
>>>>>>>>>>>> as shown by my 2 sample pictures. BTW, in your environment with
>>>>>>>>>>>> cidfmap
>>>>>>>>>>>> generated by ghostscript, my sample PDF (referring CJK CID-keyed
>>>>>>>>>>>> fonts)
>>>>>>>>>>>> is processed correctly?
>>>>>>>>>>> We can't fall back to helvetica, if a CID font is expected. In this
>>>>>>>>>>> case
>>>>>>>>>>> locateFont returns a NULL pointer!
>>>>>>>>>> I think the caller of locateFont should prepare the case that
>>>>>>>>>> no appropriate substituted font is found (if a NULL pointer
>>>>>>>>>> is not appropriate to indicate such case, some error should be
>>>>>>>>>> catched).
>>>>>>>>> That'a what I did :-). Perhaps the error message is not clear, but that
>>>>>>>>> wasn't mine :-)
>>>>>>>>>
>>>>>>>>>>> No, I've to insert additional lines in cidfmap, I attach it:
>>>>>>>>>>> mkcifdmap.ps doesn't search for Pr-fonts. I add only lines for 4
>>>>>>>>>>> fonts,
>>>>>>>>>>> of course I could do it also for the others. I just want to show how
>>>>>>>>>>> easy it is.
>>>>>>>>>> Good to know. I was thinking current Ghostscript CJK font
>>>>>>>>>> handling is not so intellectual so I expected that making
>>>>>>>>>> Ghostscript some configuration data would not be an one-stop
>>>>>>>>>> solution.
>>>>>>>>> As I already mentioned, would be nice if You give us a better solution.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> mpsuzuki
>>>>>>>>>>
>>>>>>>>>>> I attach my cidfmap (be carefull, my windows home directory is
>>>>>>>>>>> f:/windows). With these additional lines I got the attached result,
>>>>>>>>>>> and
>>>>>>>>>>> these warnings:
>>>>>>>>>>>
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'MS-PMincho', subst is
>>>>>>>>>>> 'MS-Mincho'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'MS-Gothic', subst is
>>>>>>>>>>> 'MS-Mincho'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'MS-PGothic', subst is
>>>>>>>>>>> 'MS-Mincho'mpsuzuki at hiroshima-u.ac.jp Syntax Error: Couldn't find a
>>>>>>>>>>> font for 'MS-UIGothic', subst is 'MS-Mincho' Syntax Error: Couldn't
>>>>>>>>>>> find a font for 'RyuminPr6-Light-Identity-H', subst is
>>>>>>>>>>> 'ArialUnicodeMS-JP;'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H',
>>>>>>>>>>> subst is 'ArialUnicodeMS-JP;'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>>>>>>>> Syntax Error: Couldn't find a font for
>>>>>>>>>>> 'GothicBBBPr6-Medium-Identity-H',
>>>>>>>>>>> subst is 'ArialUnicodeMS-JP'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'HiraMinPro-W3-Identity-H',
>>>>>>>>>>> subst
>>>>>>>>>>> is 'ArialUnicodeMS-JP'
>>>>>>>>>>> Syntax Error: Couldn't find a font for 'HiraKakuStd-W3-Identity-H',
>>>>>>>>>>> subst is 'ArialUnicodeMS-JP'
>>>>>>>>>>>
>>>>>>>>>>>>>> Thus, I'm afraid more efforts are needed for hardwired CID-keyed
>>>>>>>>>>>>>> font fallback. At least, using MS-Mincho is not good idea, and,
>>>>>>>>>>>>>> appropriate warning should be printed. Of course, I'm willing to
>>>>>>>>>>>>>> work for this issue.
>>>>>>>>>>>>> Isn't
>>>>>>>>>>>>> error(-1, "Couldn't find a font for '%s', subst is '%s'",
>>>>>>>>>>>>> fontName->getCString(), substFontName);
>>>>>>>>>>>>>
>>>>>>>>>>>>> an appropiate warning???
>>>>>>>>>>>> I think it's slightly insufficient, substitution of CID-keyed font
>>>>>>>>>>>> by non-CID-keyed is warned with more detail (Adobe-Japan1 font blah
>>>>>>>>>>>> blah blah is substituted by non-CID-keyed blah blah blah).
>>>>>>>>>>> You're not completely true: a non-CID-keyed font is still substituted
>>>>>>>>>>> by
>>>>>>>>>>> Helvetica, only a CID-keyed font is replaced by MS Mincho. But if You
>>>>>>>>>>> want another warning, just feel free to change the code.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>>>> Thomas Freitag wrote:
>>>>>>>>>>>>>>> Am 03.03.2012 17:40, schrieb suzuki toshiya:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm quite sorry for that no CJK helpers involves this issue....
>>>>>>>>>>>>>>>> The required help is a rewrite of your patch to fit the poppler
>>>>>>>>>>>>>>>> coding convention, and for the maintainers working with Unix
>>>>>>>>>>>>>>>> systems? If it is possible to do without Visual Studio, I will
>>>>>>>>>>>>>>>> try.
>>>>>>>>>>>>>>> Hopefully done now. As far as I rmember it was  Your patch (bug
>>>>>>>>>>>>>>> 11413) I
>>>>>>>>>>>>>>> just applied to PSOutputDev.cc
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> BTW, yet I've not checked your patch in detail, your patch is
>>>>>>>>>>>>>>>> trying to convert all missing (non-embedded) CID-keyed CJK fonts
>>>>>>>>>>>>>>>> by MS Mincho? I think it is not good idea for the users of
>>>>>>>>>>>>>>>> Adobe-GB1 (PRC, Singapore), Adobe-CNS1 (Taiwan, HongKong),
>>>>>>>>>>>>>>>> Adobe-Korea1 (ROK). I'm not sure if Ghostscript does so, but
>>>>>>>>>>>>>>>> even if Ghostscript does so, poppler should not follow it.
>>>>>>>>>>>>>>>> In fact, the coverage of CJK Ideographs are differently
>>>>>>>>>>>>>>>> designed to fit to each markets.
>>>>>>>>>>>>>>> No, it was not my goal to substitute all CID keyed fonts by MS
>>>>>>>>>>>>>>> Mincho.
>>>>>>>>>>>>>>> The problem under Windows is just, that if there is no font with
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> used name installed, poppler tried to replace it with Helvetica,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> because this is not a CID font no characters at all will be shown.
>>>>>>>>>>>>>>> So I
>>>>>>>>>>>>>>> thought that MS Mincho is at least for this case a better idea as
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> default CID font.
>>>>>>>>>>>>>>> But if You copy the cidfmap produced by mkcidfm.ps from
>>>>>>>>>>>>>>> ghostscript
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the poppler data dir, that substitute font table will be used if
>>>>>>>>>>>>>>> (and
>>>>>>>>>>>>>>> only if) the font is not embedded and not installed under windows.
>>>>>>>>>>>>>>> Hope
>>>>>>>>>>>>>>> that fits for CJK users, I thought it was better to use an
>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>> substitution algorithm than do nothing. And as far as I understand
>>>>>>>>>>>>>>> mkcidfm.ps it will also try to find suitable fonts for GB1, CNS1
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> others, but I'm no expert for CJK fonts. The cidfmap I produced on
>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>> system would use arialuni.ttf for all CJK fonts, but I have just
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Microsoft default fonts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> poppler mailing list
>>>>>>>>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>>> .
>>>>>>>>>> _______________________________________________
>>>>>>>>>> poppler mailing list
>>>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>>>
>>>>>>>>>> .
>>>>>>> _______________________________________________
>>>>>>> poppler mailing list
>>>>>>> poppler at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>> _______________________________________________
>>>>>> poppler mailing list
>>>>>> poppler at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>
>>>>> .
>>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> .
>>
>
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list