[Libreoffice-bugs] [Bug 135538] New: Search-Replace: Regular Expression engine fails on zero length matches
bugzilla-daemon at bugs.documentfoundation.org
bugzilla-daemon at bugs.documentfoundation.org
Fri Aug 7 15:51:56 UTC 2020
https://bugs.documentfoundation.org/show_bug.cgi?id=135538
Bug ID: 135538
Summary: Search-Replace: Regular Expression engine fails on
zero length matches
Product: LibreOffice
Version: 7.0.0.3 release
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: UI
Assignee: libreoffice-bugs at lists.freedesktop.org
Reporter: masz0 at yahoo.co.uk
Description:
It seems the regular expression engine (in Search-Replace) expects in most
instances to match a string of some length > 0. It fails on zero-length
matches.
Steps to reproduce:
1. Enter text in a cell in Calc, or a paragraph in Writer.
E.g. "abcde".
2. Attempt to Search-Replace using a regular expression that would make the
"match" zero-width (using any valid and text-matching combination of
look-behind and look-ahead).
E.g. "(?<=ab)"
(but not "(?<=de)" matching text at the end of Calc cell - see notes below)
Current behavior:
No match is found.
Expected behavior:
- Minimum:
Not return a result of "no match", but "matching to zero-length string not
allowed" (or some such).
Something to indicate that there isn't necessarily anything wrong with logic of
the used regular expression - LO just hasn't implemented a way to process it -
regardless of whether it's a design decision, unfinished functionality, or a
bug. I personally spent hours trying to get this to work, thinking it was
user/application configuration error - even OS configuration error.
This would also be an adequate stop-gap measure if it was decided to go ahead
with a more comprehensive solution (like my "preferred" scheme below), but that
due to prioritization or delays would take long time to arrive.
- Preferred:
Zero-width matches should be found normally - at least as long as they have
some meaningful anchor so aren't pathological and match at every position -
like "(?=.?)".
If matching every position (pathological case) is not allowed, more accurate
reporting would be preferable: "matching at every position not allowed".
Or limit matching every position to selection, and return "matching at every
position only allowed for selection" when attempted elsewhere.
Reproducible: Always
User Profile Reset: Yes
This problem affects at least Calc and Writer - I suppose the entire suite
shares the same regex engine.
It is present in both the current 7.0.0.3 and the 6.x version I used a few days
ago. (I thought my install might be borked due to this, so went to download the
latest version to reinstall. Turns out 7 had just come out.)
Additional notes/confirmation testing:
Assuming source text "abcde", these all will match:
(?<=ab)c
c(?=de)
(?<=ab)c(?=de)
But if your match is zero width (you want to add something after, before, or
between), it won't match:
(?<=ab)
(?=cd)
(?<=ab)(?=cd)
or even
^
Of course depending on the situation, this problem can be sidestepped by doing
something like "(ab)" -> "$1addthis".
Something special is going on with "end of line", in that
$
(?=$)
both work (in Calc and Writer).
In Calc, still assuming text "abcde", even
(?<=de)
works when "de" is found at the end of a cell, but not elsewhere.
My 2 systems:
Windows 10 64-bit 1909 (Windows Beta Unicode UTF-8 support enabled)
Windows 10 64-bit 2004 (Windows Beta Unicode UTF-8 support enabled/disabled;
also tried resetting profile)
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20200807/434ac709/attachment-0001.htm>
More information about the Libreoffice-bugs
mailing list