[poppler] [PATCH] per-collection fallback for missing CID-keyed fonts on Win32

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Sun Apr 1 03:23:14 PDT 2012


Dear Thomas,

Thank you for report the issue caused by my patch
before the official release of 0.20.

On Sun, 01 Apr 2012 11:32:43 +0200
Thomas Freitag <Thomas.Freitag at kabelmail.de> wrote:
>>> Albert Astals Cid wrote:
>>>>> * Adobe-CNS1 (Taiwan) ->  fallback to MingLiU
>>>>> * Adobe-GB1 (China mainland) ->  fallback to SimSun
>>>>> * Adobe-Japan1 (Japan) ->  fallback to MS-Mincho
>>>>> * Adobe-Japan2 (Japan) ->  fallback to MS-Mincho
>>>>> * Adobe-Korea1 (Republic of Korea) ->  fallback to Batang
>>>> Does windows ship with those fonts?
>>> Yes, of course, at least, after Windows 2000.
>>> * MingLiU is available since Microsoft Windows 95 for Traditional Chinese,
>>> * SimSun is available since Microsoft Windows 2000 (at least).
>>> * MS-Mincho is available since Microsoft Windows 3.1 for Japanese,
>>> * Batang is available since Microsoft Windows 2000 (at least).
>>>
>>> You may want to see a list showing which versions of Microsoft Windows
>>> (or which versions of Microsoft Office) ship which fonts. Me too, please
>>> give me more time to check. I checked
>>> 	http://www.microsoft.com/typography/fonts/family.aspx ,
>>> but it does not list the history before Windows 2000.
>I'm still working with Windows XP, and Your're true: when I'm looking at 
>the link, click on "Find fonts" and select Windows XP, the fonts should 
>be bundled. But if I run pdftoppm, it says:
>
>Syntax Error: No display font for 'MingLiU'
>Syntax Error: No display font for 'SimSun'
>Syntax Error: No display font for 'MS-Mincho'
>Syntax Error: No display font for 'Batang'
>
>SimSun, MingLiU and Batang are really not in my Windows font directory, 

Umm. So, the http://www.microsoft.com/typography/fonts/family.aspx
may describe about the summed coverages of all localized versions.
When I could have a contact with Microsoft people, I will ask for
extra informations about localizations.

>MS-Minchu is not find because it's extension is ".ttf" and NOT ".ttc". 

OK, ".ttc" -> ".ttf" (then -> ".otf" -> ".pfb" ?) fallback is
already in my todo list, I will finish it within 24 hours.

	https://bugs.freedesktop.org/show_bug.cgi?id=48046

I will post my preliminary patch to there, and, when I could
make you satified, I will post the patch to this mailing list.

>Would it be an idea to ignore not exsiting CJK-fonts and fall back to 
>ArialUnicode in this case?

Of course, it is reasonably expected feature, and I have a
draft of the patch to put a list of candidate fonts (may be
found, or may not be found).

>Sorry for the late test, was very busy in the last days,

Also I have to say sorry to Albert, and I have to thank to you.

Regards,
mpsuzuki


>Thomas
>
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>> Albert Astals Cid wrote:
>>>> va>
>>>> escriure:
>>>>> Hi all,
>>>> Hi
>>>>
>>>>> Considering the forthcoming deadline for 0.20 feature
>>>>> freeze, here I propose a small patch as a first step
>>>>> to better fallback for missing CID-keyed CJK fonts.
>>>>>
>>>>> As I discussed with Thomas, current poppler always
>>>>> tries to use a serif typeface for Japanese market
>>>>> (MS-Mincho), if the user does not make special font
>>>>> fallback definition table. Attached patch is a small
>>>>> enhancement of Thomas's work; it checks the collection
>>>>> of the missing CID-keyed font, and if it is known
>>>>> Adobe collection (Adobe-CNS1, -GB1, -Japan1, -Japan2,
>>>>> -Korea1), the fallback TrueType define for each collection
>>>>> is used.
>>>>>
>>>>> * Adobe-CNS1 (Taiwan) ->  fallback to MingLiU
>>>>> * Adobe-GB1 (China mainland) ->  fallback to SimSun
>>>>> * Adobe-Japan1 (Japan) ->  fallback to MS-Mincho
>>>>> * Adobe-Japan2 (Japan) ->  fallback to MS-Mincho
>>>>> * Adobe-Korea1 (Republic of Korea) ->  fallback to Batang
>>>> Does windows ship with those fonts?
>>>>
>>>> Albert
>>>>
>>>>> I'm working for further enhancement (I think missing
>>>>> Sans Serif CJK typeface should be fallbacked to another
>>>>> Sans Serif CJK typeface, as far as anything is available),
>>>>> but the investigation of historical typeface availability
>>>>> on Microsoft Windows would need some time.
>>>>>
>>>>> Regards,
>>>>> mpsuzuki
>>>>>
>>>>>
>>>>> On Fri, 23 Mar 2012 22:28:27 +0100
>>>>>
>>>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>  wrote:
>>>>>> Am 23.03.2012 14:06, schrieb mpsuzuki at hiroshima-u.ac.jp:
>>>>>>> On Fri, 23 Mar 2012 08:42:01 +0100
>>>>>>>
>>>>>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>   wrote:
>>>>>>>> Am 23.03.2012 08:21, schrieb suzuki toshiya:
>>>>>>>>> Excuse me, this is the 3rd issue in your first post (add a support
>>>>>>>>> to reflect cidfmap generated by ghostscript), that is not what I
>>>>>>>>> care.
>>>>>>>>> What I care is about hardwired MS-Mincho fallback.
>>>>>>>> It's just hard wired if a CID font is expected and no appropiate
>>>>>>>> substitute font is found. Propablay a better idea is to use
>>>>>>>> arialuni.ttf
>>>>>>>> instead of MS Mincho, but when I started with it, I only knew that MS
>>>>>>>> Mincho is always installed and has some CJK chars.
>>>>>>> One of the important problem in using single CJK font (e.g.
>>>>>>> MS Mincho) as generic fallback is that the coverage of the
>>>>>>> characters of CJK fonts are highly dependent with the assumed
>>>>>>> market.
>>>>>>>
>>>>>>> For example, the fonts designed for China mainland, Taiwan
>>>>>>> and Japan are usually missing Hangul. They should not be
>>>>>>> used for Adobe-Korea1 fallback. In addition, the fonts
>>>>>>> designed for Japan, Taiwan, Korea are usually missing the
>>>>>>> simplified characters currently used in China mainland.
>>>>>>> Also, the latest version of Adobe-GB1 includes Yi script
>>>>>>> (U+A000 - U+A4BF), but the fonts for Taiwan, Japan and Korea
>>>>>>> are usually missing them.
>>>>>>>
>>>>>>> Nothing to say, for Japanese customers, using MS Mincho or
>>>>>>> MS Gothic as generic fallback would be better than using
>>>>>>> SimSun (for China mainland), MingLiU (for Taiwan) or Batang
>>>>>>> (for Korea) as generic fallback, but it is unfair solution.
>>>>>>>
>>>>>>> Using Arial Unicode as generic fallback would be neutral,
>>>>>>> although its typeface quality for CJK scripts is often
>>>>>>> disrespected. In addition, its vertical writing mode support
>>>>>>> is insufficient.
>>>>>>>
>>>>>>> I attached 1 PDF and 2 pictures; one picture is fallbacked
>>>>>>> by MS Mincho, another picture is fallbacked to Arial Unicode.
>>>>>>>
>>>>>>> Thus, I will propose a patch to prepare per-collection
>>>>>>> fallback fonts (for Adobe-CNS1, Adobe-GB1, Adobe-Japan1,
>>>>>>> Adobe-Japan2, Adobe-Korea1) and finally fallback to Arial
>>>>>>> Unicode when no appropriate one is found.
>>>>>> In my opinion: sounds great. I implemented my "poor" patch because I
>>>>>> sometimes debug problems with CJK fonts under Windows and see nothing.
>>>>>> So a better implementation is always welcome for me.
>>>>>>
>>>>>>>>>> I'm not really sure where the poppler data dir ist expected on
>>>>>>>>>> MinGW,
>>>>>>>>>> should be /usr/local/share/poppler, otherwise You can patch the code
>>>>>>>>>> where the GlobalParam constructor is called , I do it normally under
>>>>>>>>>> windows:
>>>>>>>>>>
>>>>>>>>>> globalParams = new
>>>>>>>>>> GlobalParams("E:\\Downloads\\poppler\\poppler-data-0.4.5");
>>>>>>>>>>
>>>>>>>>>> and copy  cidfmap to that directory.
>>>>>>>>>> If You don't do this (and only then), all CJK fonts fall back to MS
>>>>>>>>>> Mincho.
>>>>>>>>> Yes, it (the case without cidfmap) is what I care. I think "all CJK
>>>>>>>>> fonts fall back to MS Mincho" is worse than fallback to Helvetica,
>>>>>>>>> as shown by my 2 sample pictures. BTW, in your environment with
>>>>>>>>> cidfmap
>>>>>>>>> generated by ghostscript, my sample PDF (referring CJK CID-keyed
>>>>>>>>> fonts)
>>>>>>>>> is processed correctly?
>>>>>>>> We can't fall back to helvetica, if a CID font is expected. In this
>>>>>>>> case
>>>>>>>> locateFont returns a NULL pointer!
>>>>>>> I think the caller of locateFont should prepare the case that
>>>>>>> no appropriate substituted font is found (if a NULL pointer
>>>>>>> is not appropriate to indicate such case, some error should be
>>>>>>> catched).
>>>>>> That'a what I did :-). Perhaps the error message is not clear, but that
>>>>>> wasn't mine :-)
>>>>>>
>>>>>>>> No, I've to insert additional lines in cidfmap, I attach it:
>>>>>>>> mkcifdmap.ps doesn't search for Pr-fonts. I add only lines for 4
>>>>>>>> fonts,
>>>>>>>> of course I could do it also for the others. I just want to show how
>>>>>>>> easy it is.
>>>>>>> Good to know. I was thinking current Ghostscript CJK font
>>>>>>> handling is not so intellectual so I expected that making
>>>>>>> Ghostscript some configuration data would not be an one-stop
>>>>>>> solution.
>>>>>> As I already mentioned, would be nice if You give us a better solution.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Thomas
>>>>>>
>>>>>>> Regards,
>>>>>>> mpsuzuki
>>>>>>>
>>>>>>>> I attach my cidfmap (be carefull, my windows home directory is
>>>>>>>> f:/windows). With these additional lines I got the attached result,
>>>>>>>> and
>>>>>>>> these warnings:
>>>>>>>>
>>>>>>>> Syntax Error: Couldn't find a font for 'MS-PMincho', subst is
>>>>>>>> 'MS-Mincho'
>>>>>>>> Syntax Error: Couldn't find a font for 'MS-Gothic', subst is
>>>>>>>> 'MS-Mincho'
>>>>>>>> Syntax Error: Couldn't find a font for 'MS-PGothic', subst is
>>>>>>>> 'MS-Mincho'mpsuzuki at hiroshima-u.ac.jp Syntax Error: Couldn't find a
>>>>>>>> font for 'MS-UIGothic', subst is 'MS-Mincho' Syntax Error: Couldn't
>>>>>>>> find a font for 'RyuminPr6-Light-Identity-H', subst is
>>>>>>>> 'ArialUnicodeMS-JP;'
>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H',
>>>>>>>> subst is 'ArialUnicodeMS-JP;'
>>>>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>>>>> Syntax Error: Couldn't find a font for
>>>>>>>> 'GothicBBBPr6-Medium-Identity-H',
>>>>>>>> subst is 'ArialUnicodeMS-JP'
>>>>>>>> Syntax Error: Couldn't find a font for 'HiraMinPro-W3-Identity-H',
>>>>>>>> subst
>>>>>>>> is 'ArialUnicodeMS-JP'
>>>>>>>> Syntax Error: Couldn't find a font for 'HiraKakuStd-W3-Identity-H',
>>>>>>>> subst is 'ArialUnicodeMS-JP'
>>>>>>>>
>>>>>>>>>>> Thus, I'm afraid more efforts are needed for hardwired CID-keyed
>>>>>>>>>>> font fallback. At least, using MS-Mincho is not good idea, and,
>>>>>>>>>>> appropriate warning should be printed. Of course, I'm willing to
>>>>>>>>>>> work for this issue.
>>>>>>>>>> Isn't
>>>>>>>>>> error(-1, "Couldn't find a font for '%s', subst is '%s'",
>>>>>>>>>> fontName->getCString(), substFontName);
>>>>>>>>>>
>>>>>>>>>> an appropiate warning???
>>>>>>>>> I think it's slightly insufficient, substitution of CID-keyed font
>>>>>>>>> by non-CID-keyed is warned with more detail (Adobe-Japan1 font blah
>>>>>>>>> blah blah is substituted by non-CID-keyed blah blah blah).
>>>>>>>> You're not completely true: a non-CID-keyed font is still substituted
>>>>>>>> by
>>>>>>>> Helvetica, only a CID-keyed font is replaced by MS Mincho. But if You
>>>>>>>> want another warning, just feel free to change the code.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>>>> Thomas Freitag wrote:
>>>>>>>>>>>> Am 03.03.2012 17:40, schrieb suzuki toshiya:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm quite sorry for that no CJK helpers involves this issue....
>>>>>>>>>>>>> The required help is a rewrite of your patch to fit the poppler
>>>>>>>>>>>>> coding convention, and for the maintainers working with Unix
>>>>>>>>>>>>> systems? If it is possible to do without Visual Studio, I will
>>>>>>>>>>>>> try.
>>>>>>>>>>>> Hopefully done now. As far as I rmember it was  Your patch (bug
>>>>>>>>>>>> 11413) I
>>>>>>>>>>>> just applied to PSOutputDev.cc
>>>>>>>>>>>>
>>>>>>>>>>>>> BTW, yet I've not checked your patch in detail, your patch is
>>>>>>>>>>>>> trying to convert all missing (non-embedded) CID-keyed CJK fonts
>>>>>>>>>>>>> by MS Mincho? I think it is not good idea for the users of
>>>>>>>>>>>>> Adobe-GB1 (PRC, Singapore), Adobe-CNS1 (Taiwan, HongKong),
>>>>>>>>>>>>> Adobe-Korea1 (ROK). I'm not sure if Ghostscript does so, but
>>>>>>>>>>>>> even if Ghostscript does so, poppler should not follow it.
>>>>>>>>>>>>> In fact, the coverage of CJK Ideographs are differently
>>>>>>>>>>>>> designed to fit to each markets.
>>>>>>>>>>>> No, it was not my goal to substitute all CID keyed fonts by MS
>>>>>>>>>>>> Mincho.
>>>>>>>>>>>> The problem under Windows is just, that if there is no font with
>>>>>>>>>>>> the
>>>>>>>>>>>> used name installed, poppler tried to replace it with Helvetica,
>>>>>>>>>>>> but
>>>>>>>>>>>> because this is not a CID font no characters at all will be shown.
>>>>>>>>>>>> So I
>>>>>>>>>>>> thought that MS Mincho is at least for this case a better idea as
>>>>>>>>>>>> a
>>>>>>>>>>>> default CID font.
>>>>>>>>>>>> But if You copy the cidfmap produced by mkcidfm.ps from
>>>>>>>>>>>> ghostscript
>>>>>>>>>>>> in
>>>>>>>>>>>> the poppler data dir, that substitute font table will be used if
>>>>>>>>>>>> (and
>>>>>>>>>>>> only if) the font is not embedded and not installed under windows.
>>>>>>>>>>>> Hope
>>>>>>>>>>>> that fits for CJK users, I thought it was better to use an
>>>>>>>>>>>> existing
>>>>>>>>>>>> substitution algorithm than do nothing. And as far as I understand
>>>>>>>>>>>> mkcidfm.ps it will also try to find suitable fonts for GB1, CNS1
>>>>>>>>>>>> and
>>>>>>>>>>>> the
>>>>>>>>>>>> others, but I'm no expert for CJK fonts. The cidfmap I produced on
>>>>>>>>>>>> my
>>>>>>>>>>>> system would use arialuni.ttf for all CJK fonts, but I have just
>>>>>>>>>>>> the
>>>>>>>>>>>> Microsoft default fonts.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> poppler mailing list
>>>>>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>>> .
>>>>>>> _______________________________________________
>>>>>>> poppler mailing list
>>>>>>> poppler at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>>
>>>>>>> .
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> .
>>
>
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list