What encoding is used?
julien2412
serval2412 at yahoo.fr
Sat Apr 12 15:32:20 PDT 2014
Hello,
The use of cppcheck-htmlreport to convert raw cppcheck reports errors to
html fails for some files because of the encodings.
Here's an example message:
cppcheck/htmlreport/cppcheck-htmlreport", line 287, in <module>
content = input_file.read()
File "/usr/lib/python2.7/codecs.py", line 296, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 3546:
invalid start byte
Here's the list of files which give this problem:
./hwpfilter/source/hcode.cxx
./hwpfilter/source/hwpread.cxx
./hwpfilter/source/hbox.h
./hwpfilter/source/formula.cxx
./hwpfilter/source/hwpfile.cxx
./hwpfilter/source/hwpeq.cxx
./chart2/source/view/charttypes/Splines.cxx (was containing 2 "ü" but was
detected as iso-8859-1 and not as utf8 by "file -i"), now converted (see
http://cgit.freedesktop.org/libreoffice/core/commit/?id=42d494e7925249c36f62206e7268d849437e219d)
./hwpfilter/source/hbox.cxx
./hwpfilter/source/hinfo.cxx
I gave a try to ./hwpfilter/source/hinfo.cxx
Initial view on vi (Debian testing x86-64, French)
56 /**
57 * ¹®¼Á¤º¸¸¦ ÀоîµéÀÌ´Â ÇÔ¼ö ( 128 bytes )
58 * ¹®¼Á¤º¸´Â ÆÄÀÏÀνÄÁ¤º¸( 30 bytes ) ´ÙÀ½¿¡ À§Ä¡ÇÑ Á¤º¸ÀÌ´Ù.
59 */
60 bool HWPInfo::Read(HWPFile & hwpf)
since README from hwpfilter indicates "Hangul Word Processor" and "Korea", I
gave a try with "iconv -f EUC-KR -t utf8 hwpfilter/source/hinfo.cxx >
stdout.txt", I retrieved this:
56 /**
57 * 문서정보를 읽어들이는 함수 ( 128 bytes )
58 * 문서정보는 파일인식정보( 30 bytes ) 다음에 위치한 정보이다.
59 */
60 bool HWPInfo::Read(HWPFile & hwpf)
I gave a try to Google translate which detected the language as Korean
(hopefully! :-)) and translated this:
"Function to read the document information"
which seems ok according to the name of the function.
Remark : I don't know what means "( 128 bytes )" or "( 30 bytes)", is it a
pb in conversion?
Anyway, would this conversion be ok on these files or might we lose some
information?
Of course, I prefer cppcheck to fail the html conversion of some reports
than losing important information in these files.
Perhaps too, it's a cppcheck bug or Python bug which should be fixed.
Any idea?
Julien
--
View this message in context: http://nabble.documentfoundation.org/What-encoding-is-used-tp4105106.html
Sent from the Dev mailing list archive at Nabble.com.
More information about the LibreOffice
mailing list