Character classification

Stephan Bergmann sbergman at
Thu Apr 6 09:45:23 UTC 2017

On 04/06/2017 12:02 AM, Chris Sherlock wrote:
> On 23 Mar 2017, at 10:47 pm, Stephan Bergmann <sbergman at> wrote:
>> What remains is the source of the five C programs
>>  rsc/ (rsc/source/rscpp/cpp{2,3,5,6}.c)
>>  shell/ (shell/source/unix/misc/uri-encode.c)
>>  solenv/ (solenv/bin/concat-deps.c)
>>  soltools/ (soltools/cpp/_{tokens,unix}.c)
>>  soltools/ (soltools/mkdepend/{cppsetup,ifparser,parse}.c)
>> For one, I have added any casts from char to unsigned char where missing.  (But note that in some cases the input already was of the expected form.)
> So the recommendation is to avoid C string functions in LibreOffice code in future?

I'm not sure how you read that recommendation out of that?  (Though, 
generally, the brittle low-level memory management that comes with using 
<string.h> is indeed best avoided where possible.)

> I realise this may be a silly question, but does this mean we have a portable, cross-culture string handling module that makes things like character case handling consistent across platforms?

For culture-aware string operations, we have ICU.

>> For another, with a recent set of commits to master I have removed all but one call to setlocale from the LO code base itself.  (The remaining one is in SetSystemLocale in vcl/unx/generic/app/i18n_im.cxx, and smells like it is necessary for proper IME support in VCL-based applications on Linux.  None of those five C programs should be affected by it.)  So barring any calls to setlocale in external code, and ignoring the somewhat fuzzy definition of isprint as called from rsc/source/rscpp/cpp{5,6}.c, those five C programs should not (any longer) be affected by locale issues.
> Was this done because of the character casing challenges mentioned above? Or was calling on this causing problems elsewhere?

The main short-term motivation was to avoid any locale-specific behavior 
in the five remaining programs mentioned above.  But apart from that, 
changing such global state at random places in the program is hardly a 
good idea, especially so with the MT-issues that come with setlocale.

More information about the LibreOffice mailing list