What encoding is used?

Sun Apr 13 04:46:59 PDT 2014

Hi *,

On Sun, Apr 13, 2014 at 12:32 AM, julien2412 <serval2412 at yahoo.fr> wrote:
> Hello,
>
> The use of cppcheck-htmlreport to convert raw cppcheck reports errors to
> html fails for some files because of the encodings.
> [...]
> Here's the list of files which give this problem:
> ./hwpfilter/source/hcode.cxx
> [...]
> ./hwpfilter/source/hinfo.cxx
>
> I gave a try to ./hwpfilter/source/hinfo.cxx
> Initial view on vi (Debian testing x86-64, French)

vim just tries the ones in fileencodings when trying to guess a
charset, on my system it is fileencodings=ucs-bom,utf-8,default,latin1
- where latin1 basically is catch-all (works to load anytime, but of
course won't do any good with the file's content :-))

So you'll see the result when opened as latin1 (unless you have other
charsets in front of it)

>      56 /**
>      57  * ¹®¼Á¤º¸¸¦ ÀÐ¾îµéÀÌ´Â ÇÔ¼ö ( 128 bytes )
> [...]
> since README from hwpfilter indicates "Hangul Word Processor" and "Korea", I
> gave a try with "iconv -f EUC-KR -t utf8 hwpfilter/source/hinfo.cxx >
> stdout.txt", I retrieved this:
>       56 /**
>      57  * 문서정보를 읽어들이는 함수 ( 128 bytes )
>      58  * 문서정보는 파일인식정보( 30 bytes ) 다음에 위치한 정보이다.
>      59  */
>      60 bool HWPInfo::Read(HWPFile & hwpf)
>
> I gave a try to Google translate which detected the language as Korean
> (hopefully! :-)) and translated this:
> "Function to read the document information"
> which seems ok according to the name of the function.
> Remark : I don't know what means "( 128 bytes )" or "( 30 bytes)", is it a
> pb in conversion?

Nah, that probably just is the length of the datablock that contains
the mentioned info.

> Anyway, would this conversion be ok on these files or might we lose some
> information?

You won't loose information by converting to UTF-8 - UTF-8 is a
superset of all other encodings.

see also 73d3ad1375c2bfc60bda66bbf4bffd14c9842da2 (which was different
cause, though)

> Any idea?

Go for it.

ciao
Christian