Character classification
Stephan Bergmann
sbergman at redhat.com
Thu Apr 6 09:45:23 UTC 2017
On 04/06/2017 12:02 AM, Chris Sherlock wrote:
> On 23 Mar 2017, at 10:47 pm, Stephan Bergmann <sbergman at redhat.com> wrote:
>> What remains is the source of the five C programs
>>
>> rsc/Executable_rsc.mk (rsc/source/rscpp/cpp{2,3,5,6}.c)
>> shell/Executable_uri_encode.mk (shell/source/unix/misc/uri-encode.c)
>> solenv/Executable_concat-deps.mk (solenv/bin/concat-deps.c)
>> soltools/Executable_cpp.mk (soltools/cpp/_{tokens,unix}.c)
>> soltools/Executable_mkdepend.mk (soltools/mkdepend/{cppsetup,ifparser,parse}.c)
>>
>> For one, I have added any casts from char to unsigned char where missing. (But note that in some cases the input already was of the expected form.)
>
> So the recommendation is to avoid C string functions in LibreOffice code in future?
I'm not sure how you read that recommendation out of that? (Though,
generally, the brittle low-level memory management that comes with using
<string.h> is indeed best avoided where possible.)
> I realise this may be a silly question, but does this mean we have a portable, cross-culture string handling module that makes things like character case handling consistent across platforms?
For culture-aware string operations, we have ICU.
>> For another, with a recent set of commits to master I have removed all but one call to setlocale from the LO code base itself. (The remaining one is in SetSystemLocale in vcl/unx/generic/app/i18n_im.cxx, and smells like it is necessary for proper IME support in VCL-based applications on Linux. None of those five C programs should be affected by it.) So barring any calls to setlocale in external code, and ignoring the somewhat fuzzy definition of isprint as called from rsc/source/rscpp/cpp{5,6}.c, those five C programs should not (any longer) be affected by locale issues.
>
> Was this done because of the character casing challenges mentioned above? Or was calling on this causing problems elsewhere?
The main short-term motivation was to avoid any locale-specific behavior
in the five remaining programs mentioned above. But apart from that,
changing such global state at random places in the program is hardly a
good idea, especially so with the MT-issues that come with setlocale.
More information about the LibreOffice
mailing list