[poppler] [PATCH] per-collection fallback for missing CID-keyed fonts on Win32

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Tue Mar 27 05:01:43 PDT 2012


Dear Albert,

I'm sorry for lated reply about your question. However, the
latest patch by Thomas has already invalidated my patch,
so please let me reconsider.

Albert Astals Cid wrote:
>> * Adobe-CNS1 (Taiwan) -> fallback to MingLiU
>> * Adobe-GB1 (China mainland) -> fallback to SimSun
>> * Adobe-Japan1 (Japan) -> fallback to MS-Mincho
>> * Adobe-Japan2 (Japan) -> fallback to MS-Mincho
>> * Adobe-Korea1 (Republic of Korea) -> fallback to Batang
>
> Does windows ship with those fonts?

Yes, of course, at least, after Windows 2000.
* MingLiU is available since Microsoft Windows 95 for Traditional Chinese,
* SimSun is available since Microsoft Windows 2000 (at least).
* MS-Mincho is available since Microsoft Windows 3.1 for Japanese,
* Batang is available since Microsoft Windows 2000 (at least).

You may want to see a list showing which versions of Microsoft Windows
(or which versions of Microsoft Office) ship which fonts. Me too, please
give me more time to check. I checked
	http://www.microsoft.com/typography/fonts/family.aspx ,
but it does not list the history before Windows 2000.

Regards,
mpsuzuki


Albert Astals Cid wrote:
> El Dimarts, 27 de març de 2012, a les 00:19:24, mpsuzuki at hiroshima-u.ac.jp va 
> escriure:
>> Hi all,
> 
> Hi
> 
>> Considering the forthcoming deadline for 0.20 feature
>> freeze, here I propose a small patch as a first step
>> to better fallback for missing CID-keyed CJK fonts.
>>
>> As I discussed with Thomas, current poppler always
>> tries to use a serif typeface for Japanese market
>> (MS-Mincho), if the user does not make special font
>> fallback definition table. Attached patch is a small
>> enhancement of Thomas's work; it checks the collection
>> of the missing CID-keyed font, and if it is known
>> Adobe collection (Adobe-CNS1, -GB1, -Japan1, -Japan2,
>> -Korea1), the fallback TrueType define for each collection
>> is used.
>>
>> * Adobe-CNS1 (Taiwan) -> fallback to MingLiU
>> * Adobe-GB1 (China mainland) -> fallback to SimSun
>> * Adobe-Japan1 (Japan) -> fallback to MS-Mincho
>> * Adobe-Japan2 (Japan) -> fallback to MS-Mincho
>> * Adobe-Korea1 (Republic of Korea) -> fallback to Batang
> 
> Does windows ship with those fonts?
> 
> Albert
> 
>> I'm working for further enhancement (I think missing
>> Sans Serif CJK typeface should be fallbacked to another
>> Sans Serif CJK typeface, as far as anything is available),
>> but the investigation of historical typeface availability
>> on Microsoft Windows would need some time.
>>
>> Regards,
>> mpsuzuki
>>
>>
>> On Fri, 23 Mar 2012 22:28:27 +0100
>>
>> Thomas Freitag <Thomas.Freitag at kabelmail.de> wrote:
>>> Am 23.03.2012 14:06, schrieb mpsuzuki at hiroshima-u.ac.jp:
>>>> On Fri, 23 Mar 2012 08:42:01 +0100
>>>>
>>>> Thomas Freitag<Thomas.Freitag at kabelmail.de>  wrote:
>>>>> Am 23.03.2012 08:21, schrieb suzuki toshiya:
>>>>>> Excuse me, this is the 3rd issue in your first post (add a support
>>>>>> to reflect cidfmap generated by ghostscript), that is not what I care.
>>>>>> What I care is about hardwired MS-Mincho fallback.
>>>>> It's just hard wired if a CID font is expected and no appropiate
>>>>> substitute font is found. Propablay a better idea is to use arialuni.ttf
>>>>> instead of MS Mincho, but when I started with it, I only knew that MS
>>>>> Mincho is always installed and has some CJK chars.
>>>> One of the important problem in using single CJK font (e.g.
>>>> MS Mincho) as generic fallback is that the coverage of the
>>>> characters of CJK fonts are highly dependent with the assumed
>>>> market.
>>>>
>>>> For example, the fonts designed for China mainland, Taiwan
>>>> and Japan are usually missing Hangul. They should not be
>>>> used for Adobe-Korea1 fallback. In addition, the fonts
>>>> designed for Japan, Taiwan, Korea are usually missing the
>>>> simplified characters currently used in China mainland.
>>>> Also, the latest version of Adobe-GB1 includes Yi script
>>>> (U+A000 - U+A4BF), but the fonts for Taiwan, Japan and Korea
>>>> are usually missing them.
>>>>
>>>> Nothing to say, for Japanese customers, using MS Mincho or
>>>> MS Gothic as generic fallback would be better than using
>>>> SimSun (for China mainland), MingLiU (for Taiwan) or Batang
>>>> (for Korea) as generic fallback, but it is unfair solution.
>>>>
>>>> Using Arial Unicode as generic fallback would be neutral,
>>>> although its typeface quality for CJK scripts is often
>>>> disrespected. In addition, its vertical writing mode support
>>>> is insufficient.
>>>>
>>>> I attached 1 PDF and 2 pictures; one picture is fallbacked
>>>> by MS Mincho, another picture is fallbacked to Arial Unicode.
>>>>
>>>> Thus, I will propose a patch to prepare per-collection
>>>> fallback fonts (for Adobe-CNS1, Adobe-GB1, Adobe-Japan1,
>>>> Adobe-Japan2, Adobe-Korea1) and finally fallback to Arial
>>>> Unicode when no appropriate one is found.
>>> In my opinion: sounds great. I implemented my "poor" patch because I
>>> sometimes debug problems with CJK fonts under Windows and see nothing.
>>> So a better implementation is always welcome for me.
>>>
>>>>>>> I'm not really sure where the poppler data dir ist expected on MinGW,
>>>>>>> should be /usr/local/share/poppler, otherwise You can patch the code
>>>>>>> where the GlobalParam constructor is called , I do it normally under
>>>>>>> windows:
>>>>>>>
>>>>>>> globalParams = new
>>>>>>> GlobalParams("E:\\Downloads\\poppler\\poppler-data-0.4.5");
>>>>>>>
>>>>>>> and copy  cidfmap to that directory.
>>>>>>> If You don't do this (and only then), all CJK fonts fall back to MS
>>>>>>> Mincho.
>>>>>> Yes, it (the case without cidfmap) is what I care. I think "all CJK
>>>>>> fonts fall back to MS Mincho" is worse than fallback to Helvetica,
>>>>>> as shown by my 2 sample pictures. BTW, in your environment with cidfmap
>>>>>> generated by ghostscript, my sample PDF (referring CJK CID-keyed fonts)
>>>>>> is processed correctly?
>>>>> We can't fall back to helvetica, if a CID font is expected. In this case
>>>>> locateFont returns a NULL pointer!
>>>> I think the caller of locateFont should prepare the case that
>>>> no appropriate substituted font is found (if a NULL pointer
>>>> is not appropriate to indicate such case, some error should be
>>>> catched).
>>> That'a what I did :-). Perhaps the error message is not clear, but that
>>> wasn't mine :-)
>>>
>>>>> No, I've to insert additional lines in cidfmap, I attach it:
>>>>> mkcifdmap.ps doesn't search for Pr-fonts. I add only lines for 4 fonts,
>>>>> of course I could do it also for the others. I just want to show how
>>>>> easy it is.
>>>> Good to know. I was thinking current Ghostscript CJK font
>>>> handling is not so intellectual so I expected that making
>>>> Ghostscript some configuration data would not be an one-stop
>>>> solution.
>>> As I already mentioned, would be nice if You give us a better solution.
>>>
>>> Thanks in advance,
>>> Thomas
>>>
>>>> Regards,
>>>> mpsuzuki
>>>>
>>>>> I attach my cidfmap (be carefull, my windows home directory is
>>>>> f:/windows). With these additional lines I got the attached result, and
>>>>> these warnings:
>>>>>
>>>>> Syntax Error: Couldn't find a font for 'MS-PMincho', subst is
>>>>> 'MS-Mincho'
>>>>> Syntax Error: Couldn't find a font for 'MS-Gothic', subst is 'MS-Mincho'
>>>>> Syntax Error: Couldn't find a font for 'MS-PGothic', subst is
>>>>> 'MS-Mincho'mpsuzuki at hiroshima-u.ac.jp Syntax Error: Couldn't find a
>>>>> font for 'MS-UIGothic', subst is 'MS-Mincho' Syntax Error: Couldn't
>>>>> find a font for 'RyuminPr6-Light-Identity-H', subst is
>>>>> 'ArialUnicodeMS-JP;'
>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H',
>>>>> subst is 'ArialUnicodeMS-JP;'
>>>>> Syntax Error: Couldn't find a font for 'RyuminPr6-Light-Identity-H'
>>>>> Syntax Error: Couldn't find a font for 'GothicBBBPr6-Medium-Identity-H',
>>>>> subst is 'ArialUnicodeMS-JP'
>>>>> Syntax Error: Couldn't find a font for 'HiraMinPro-W3-Identity-H', subst
>>>>> is 'ArialUnicodeMS-JP'
>>>>> Syntax Error: Couldn't find a font for 'HiraKakuStd-W3-Identity-H',
>>>>> subst is 'ArialUnicodeMS-JP'
>>>>>
>>>>>>>> Thus, I'm afraid more efforts are needed for hardwired CID-keyed
>>>>>>>> font fallback. At least, using MS-Mincho is not good idea, and,
>>>>>>>> appropriate warning should be printed. Of course, I'm willing to
>>>>>>>> work for this issue.
>>>>>>> Isn't
>>>>>>> error(-1, "Couldn't find a font for '%s', subst is '%s'",
>>>>>>> fontName->getCString(), substFontName);
>>>>>>>
>>>>>>> an appropiate warning???
>>>>>> I think it's slightly insufficient, substitution of CID-keyed font
>>>>>> by non-CID-keyed is warned with more detail (Adobe-Japan1 font blah
>>>>>> blah blah is substituted by non-CID-keyed blah blah blah).
>>>>> You're not completely true: a non-CID-keyed font is still substituted by
>>>>> Helvetica, only a CID-keyed font is replaced by MS Mincho. But if You
>>>>> want another warning, just feel free to change the code.
>>>>>
>>>>> Cheers,
>>>>> Thomas
>>>>>
>>>>>>>> Thomas Freitag wrote:
>>>>>>>>> Am 03.03.2012 17:40, schrieb suzuki toshiya:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm quite sorry for that no CJK helpers involves this issue...
>>>>>>>>>> The required help is a rewrite of your patch to fit the poppler
>>>>>>>>>> coding convention, and for the maintainers working with Unix
>>>>>>>>>> systems? If it is possible to do without Visual Studio, I will
>>>>>>>>>> try.
>>>>>>>>> Hopefully done now. As far as I rmember it was  Your patch (bug
>>>>>>>>> 11413) I
>>>>>>>>> just applied to PSOutputDev.cc
>>>>>>>>>
>>>>>>>>>> BTW, yet I've not checked your patch in detail, your patch is
>>>>>>>>>> trying to convert all missing (non-embedded) CID-keyed CJK fonts
>>>>>>>>>> by MS Mincho? I think it is not good idea for the users of
>>>>>>>>>> Adobe-GB1 (PRC, Singapore), Adobe-CNS1 (Taiwan, HongKong),
>>>>>>>>>> Adobe-Korea1 (ROK). I'm not sure if Ghostscript does so, but
>>>>>>>>>> even if Ghostscript does so, poppler should not follow it.
>>>>>>>>>> In fact, the coverage of CJK Ideographs are differently
>>>>>>>>>> designed to fit to each markets.
>>>>>>>>> No, it was not my goal to substitute all CID keyed fonts by MS
>>>>>>>>> Mincho.
>>>>>>>>> The problem under Windows is just, that if there is no font with the
>>>>>>>>> used name installed, poppler tried to replace it with Helvetica, but
>>>>>>>>> because this is not a CID font no characters at all will be shown.
>>>>>>>>> So I
>>>>>>>>> thought that MS Mincho is at least for this case a better idea as a
>>>>>>>>> default CID font.
>>>>>>>>> But if You copy the cidfmap produced by mkcidfm.ps from ghostscript
>>>>>>>>> in
>>>>>>>>> the poppler data dir, that substitute font table will be used if
>>>>>>>>> (and
>>>>>>>>> only if) the font is not embedded and not installed under windows.
>>>>>>>>> Hope
>>>>>>>>> that fits for CJK users, I thought it was better to use an existing
>>>>>>>>> substitution algorithm than do nothing. And as far as I understand
>>>>>>>>> mkcidfm.ps it will also try to find suitable fonts for GB1, CNS1 and
>>>>>>>>> the
>>>>>>>>> others, but I'm no expert for CJK fonts. The cidfmap I produced on
>>>>>>>>> my
>>>>>>>>> system would use arialuni.ttf for all CJK fonts, but I have just the
>>>>>>>>> Microsoft default fonts.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> poppler mailing list
>>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>> .
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>
>>>> .
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list