[Libreoffice-bugs] [Bug 135538] New: Search-Replace: Regular Expression engine fails on zero length matches

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Fri Aug 7 15:51:56 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=135538

            Bug ID: 135538
           Summary: Search-Replace: Regular Expression engine fails on
                    zero length matches
           Product: LibreOffice
           Version: 7.0.0.3 release
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: UI
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: masz0 at yahoo.co.uk

Description:
It seems the regular expression engine (in Search-Replace) expects in most
instances to match a string of some length > 0. It fails on zero-length
matches.

Steps to reproduce:
1. Enter text in a cell in Calc, or a paragraph in Writer.
   E.g. "abcde".
2. Attempt to Search-Replace using a regular expression that would make the
"match" zero-width (using any valid and text-matching combination of
look-behind and look-ahead).
   E.g. "(?<=ab)"
   (but not "(?<=de)" matching text at the end of Calc cell - see notes below)

Current behavior:
No match is found.

Expected behavior:
- Minimum:
Not return a result of "no match", but "matching to zero-length string not
allowed" (or some such).

Something to indicate that there isn't necessarily anything wrong with logic of
the used regular expression - LO just hasn't implemented a way to process it -
regardless of whether it's a design decision, unfinished functionality, or a
bug. I personally spent hours trying to get this to work, thinking it was
user/application configuration error - even OS configuration error.

This would also be an adequate stop-gap measure if it was decided to go ahead
with a more comprehensive solution (like my "preferred" scheme below), but that
due to prioritization or delays would take long time to arrive.

- Preferred:
Zero-width matches should be found normally - at least as long as they have
some meaningful anchor so aren't pathological and match at every position -
like "(?=.?)".

If matching every position (pathological case) is not allowed, more accurate
reporting would be preferable: "matching at every position not allowed".

Or limit matching every position to selection, and return "matching at every
position only allowed for selection" when attempted elsewhere.

Reproducible: Always

User Profile Reset: Yes


This problem affects at least Calc and Writer - I suppose the entire suite
shares the same regex engine.

It is present in both the current 7.0.0.3 and the 6.x version I used a few days
ago. (I thought my install might be borked due to this, so went to download the
latest version to reinstall. Turns out 7 had just come out.)


Additional notes/confirmation testing:

Assuming source text "abcde", these all will match:
    (?<=ab)c
    c(?=de)
    (?<=ab)c(?=de)

But if your match is zero width (you want to add something after, before, or
between), it won't match:
    (?<=ab)
    (?=cd)
    (?<=ab)(?=cd)
or even
    ^

Of course depending on the situation, this problem can be sidestepped by doing
something like "(ab)" -> "$1addthis".

Something special is going on with "end of line", in that
    $
    (?=$)
both work (in Calc and Writer).

In Calc, still assuming text "abcde", even
    (?<=de)
works when "de" is found at the end of a cell, but not elsewhere.



My 2 systems:
Windows 10 64-bit 1909 (Windows Beta Unicode UTF-8 support enabled)
Windows 10 64-bit 2004 (Windows Beta Unicode UTF-8 support enabled/disabled;
also tried resetting profile)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20200807/434ac709/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list