[Libreoffice-bugs] [Bug 105711] FIREBIRD: Couldn' t create a varchar-field with less than 20 characters in GUI

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sun May 28 18:09:44 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=105711

--- Comment #10 from Tamas Bunth <btomi96 at gmail.com> ---
(In reply to Lionel Elie Mamane from comment #9)
> (In reply to Tamas Bunth from comment #7)
> > The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).
> 
> > A character now occupies up to 4 bytes, so when the 'length' option in the
> > GUI is set to 5 it will allocate 20 bytes.
> 
> > You can put more than 5 characters, because most of the characters don't
> > need all the 4 bytes.
> 
> > SO what is the correct behavior of the 'length' GUI option set to 5?
> 
> The expected behaviour is 5 characters ("codepoint" in Unicode terminology),
> independently of how many bytes that is.
> 
> I understand Firebird deviates from the SQL standard in that respect, and
> takes a length in bytes. We could try to implement the "characters" thing
> within the Firebird SDBC driver (as your patch on gerrit tries to do, but I
> wouldn't bother. If we try to do that, there will always be cases that we
> won't catch. We will correctly implement that when the field is updated
> through the SDBC API, but then the user will do:
> 
> SELECT LENGTH(field) as len FROM table;
> or
> UPDATE TABLE SET field='+' || field || '+';
> 
> and get unexpected results (results inconsistent with the GUI).

Hmm.. Right.

> I'd just let it be, and let the Firebird "misfeature" bubble up. A length of
> 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands
> that as 5 bytes, then so be it.

A length of 5 in the GUI is a VARCHAR(5) in Firebird, which allocates 20 bytes
(as long as the character set is UTF-8).

The problem is rather the following:
Firebird truncates the input string at 20 bytes no matter how many characters
it contains.

I think it's a bug in Firebird (but I couldn't find it in the bug tracker).

> Maybe a middle ground would be to use UTF16 instead of UTF8 and let the
> length be UTF16 code units (that is length 5 is 10 bytes), if that's
> possible with Firebird. This is still a bug, but confusing character and
> UTF16 code units is so common that it usually not perceived as a bug. In my
> opinion, if this gives as a result that LENGTH(field) is in bytes and not in
> UTF16 code units, it is inconsistent and not desirable.

There are two kind of length functions (actually there are three of them):
char_length and octet_length[1]

and they are working as expected.

[1] https://firebirdsql.org/refdocs/langrefupd20-char-length.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20170528/127f7634/attachment.html>


More information about the Libreoffice-bugs mailing list