[poppler] How to recognize the Japan Font.
William Bader
williambader at hotmail.com
Wed May 29 16:45:05 UTC 2019
pdffonts shows that the fonts are embedded but subsetted. Subsetting preserves the glyphs but sometimes loses the mapping back to a unicode code point, which can make the text unextractable. See for example https://forums.adobe.com/thread/1990373
Regards, William
________________________________
From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Zhong, Steven <Steven.Zhong at fil.com>
Sent: Tuesday, May 28, 2019 11:34 PM
To: 'poppler at lists.freedesktop.org'
Subject: [poppler] How to recognize the Japan Font.
Hi All,
I want to convert the PDF that you can refer the link https://www.fidelity.jp/static/pdf/fund/5111893-FD30BA/Reports/Monthly/FD30BA-MF-201904.pdf
But cant read it correctly , I find the Font is
MS-PGothic-90ms-RKSJ-H
Encoding is Identify-H
Convert to txt is like below. I guess it is font missing. How to install the font and to read it currently. Many Thanks
ᅜෆ⥲⏕⏘䠄㻳㻰㻼䠅ᡂ㛗⋡䛜๓ᅄ༙ᮇ䛸ྠỈ‽䛻䛺䜚䚸୰ᅜᬒẼ䛻ᗏධ䜜ឤ䛜ฟጞ䜑䛯䛣䛸䜒㈙䛔Ᏻᚰឤ䛻䛴䛺
vcap at e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -v
pdfinfo version 0.62.0
Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
My popper is 0.6.2
vcap at e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -listenc
Available encodings are:
ASCII7
Big5
Big5ascii
EUC-CN
EUC-JP
GBK
ISO-2022-CN
ISO-2022-JP
ISO-2022-KR
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
KOI8-R
Latin1
Latin2
Shift-JIS
Symbol
TIS-620
UTF-16
UTF-8
Windows-1255
ZapfDingbats
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20190529/20ee6336/attachment.html>
More information about the poppler
mailing list