[Libreoffice-bugs] [Bug 125110] CalcSpreadsheet: issues converting .CSV where there are more than 30K rows of data

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Mon Aug 16 17:18:48 UTC 2021


https://bugs.documentfoundation.org/show_bug.cgi?id=125110

Eike Rathke <erack at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|Windows (All)               |All
           Hardware|x86-64 (AMD64)              |All

--- Comment #13 from Eike Rathke <erack at redhat.com> ---
(In reply to Mike Kaganski from comment #7)
> So I suppose that what should have been done here is:
> 1. Seeing the opening double quote in the beginning of the field, start
> "quote-enclosed field" mode.
> 2. If it encounters something *invalid* for such a mode, it should re-read
> the field again, this time without the "quote-enclosed field" mode (to
> properly re-consume possible field separators that could had been read in
> the first pass as the quoted field content).
> 
> This way, this sample would be read properly, without introducing any
> ambiguity.
It would fail in other constellations that now are handled well, like

"abc "def" ghi, jkl"

where
|abc "def" ghi, jkl|
is supposed to be *one* field content because the generator didn't escape
quotes by doubling them. Your approach would result in
|"abc "def" ghi| jkl"|

Whatever we'll do, it will make things fail differently for other data of
broken generators. You could throw more logic at it like thinking in "words" to
be ignored re-triggering quotes have to have a space left (opening quote) or
right (closing quote), which would fail for data that simply doesn't follow
that assumption. Things get even worse if space was a field separator.

Take a look at what is done with the field start mode and quote state to fix
known broken data cases and bug 48621 for test case sample files and related.

I tend to close this for a too broken generator, but if you can come up with
some loose magic that doesn't break any of the already handled cases, then
fine..

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20210816/ce3c6687/attachment.htm>


More information about the Libreoffice-bugs mailing list