What encoding is used?

Sat Apr 12 15:32:20 PDT 2014

Hello,

The use of cppcheck-htmlreport to convert raw cppcheck reports errors to
html fails for some files because of the encodings.
Here's an example message:
cppcheck/htmlreport/cppcheck-htmlreport", line 287, in <module>
    content = input_file.read()
  File "/usr/lib/python2.7/codecs.py", line 296, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 3546:
invalid start byte

Here's the list of files which give this problem:
./hwpfilter/source/hcode.cxx
./hwpfilter/source/hwpread.cxx
./hwpfilter/source/hbox.h
./hwpfilter/source/formula.cxx
./hwpfilter/source/hwpfile.cxx
./hwpfilter/source/hwpeq.cxx
./chart2/source/view/charttypes/Splines.cxx (was containing 2 "ü" but was
detected as iso-8859-1 and not as utf8 by "file -i"), now converted (see
http://cgit.freedesktop.org/libreoffice/core/commit/?id=42d494e7925249c36f62206e7268d849437e219d)
./hwpfilter/source/hbox.cxx
./hwpfilter/source/hinfo.cxx

I gave a try to ./hwpfilter/source/hinfo.cxx
Initial view on vi (Debian testing x86-64, French)
     56 /**
     57  * ¹®¼Á¤º¸¸¦ ÀÐ¾îµéÀÌ´Â ÇÔ¼ö ( 128 bytes )
     58  * ¹®¼Á¤º¸´Â ÆÄÀÏÀÎ½ÄÁ¤º¸( 30 bytes ) ´ÙÀ½¿¡ À§Ä¡ÇÑ Á¤º¸ÀÌ´Ù.
     59  */
     60 bool HWPInfo::Read(HWPFile & hwpf)

since README from hwpfilter indicates "Hangul Word Processor" and "Korea", I
gave a try with "iconv -f EUC-KR -t utf8 hwpfilter/source/hinfo.cxx >
stdout.txt", I retrieved this:
      56 /**
     57  * 문서정보를 읽어들이는 함수 ( 128 bytes )
     58  * 문서정보는 파일인식정보( 30 bytes ) 다음에 위치한 정보이다.
     59  */
     60 bool HWPInfo::Read(HWPFile & hwpf)

I gave a try to Google translate which detected the language as Korean
(hopefully! :-)) and translated this:
"Function to read the document information"
which seems ok according to the name of the function.
Remark : I don't know what means "( 128 bytes )" or "( 30 bytes)", is it a
pb in conversion?

Anyway, would this conversion be ok on these files or might we lose some
information?
Of course, I prefer cppcheck to fail the html conversion of some reports
than losing important information in these files.
Perhaps too, it's a cppcheck bug or Python bug which should be fixed.

Any idea?

Julien

--
View this message in context: http://nabble.documentfoundation.org/What-encoding-is-used-tp4105106.html
Sent from the Dev mailing list archive at Nabble.com.