<html>
<head>
<base href="https://bugs.documentfoundation.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_UNCONFIRMED "
title="UNCONFIRMED - Search-Replace: Regular Expression engine fails on zero length matches"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=135538">135538</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Search-Replace: Regular Expression engine fails on zero length matches
</td>
</tr>
<tr>
<th>Product</th>
<td>LibreOffice
</td>
</tr>
<tr>
<th>Version</th>
<td>7.0.0.3 release
</td>
</tr>
<tr>
<th>Hardware</th>
<td>All
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>UNCONFIRMED
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>medium
</td>
</tr>
<tr>
<th>Component</th>
<td>UI
</td>
</tr>
<tr>
<th>Assignee</th>
<td>libreoffice-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>masz0@yahoo.co.uk
</td>
</tr></table>
<p>
<div>
<pre>Description:
It seems the regular expression engine (in Search-Replace) expects in most
instances to match a string of some length > 0. It fails on zero-length
matches.
Steps to reproduce:
1. Enter text in a cell in Calc, or a paragraph in Writer.
E.g. "abcde".
2. Attempt to Search-Replace using a regular expression that would make the
"match" zero-width (using any valid and text-matching combination of
look-behind and look-ahead).
E.g. "(?<=ab)"
(but not "(?<=de)" matching text at the end of Calc cell - see notes below)
Current behavior:
No match is found.
Expected behavior:
- Minimum:
Not return a result of "no match", but "matching to zero-length string not
allowed" (or some such).
Something to indicate that there isn't necessarily anything wrong with logic of
the used regular expression - LO just hasn't implemented a way to process it -
regardless of whether it's a design decision, unfinished functionality, or a
bug. I personally spent hours trying to get this to work, thinking it was
user/application configuration error - even OS configuration error.
This would also be an adequate stop-gap measure if it was decided to go ahead
with a more comprehensive solution (like my "preferred" scheme below), but that
due to prioritization or delays would take long time to arrive.
- Preferred:
Zero-width matches should be found normally - at least as long as they have
some meaningful anchor so aren't pathological and match at every position -
like "(?=.?)".
If matching every position (pathological case) is not allowed, more accurate
reporting would be preferable: "matching at every position not allowed".
Or limit matching every position to selection, and return "matching at every
position only allowed for selection" when attempted elsewhere.
Reproducible: Always
User Profile Reset: Yes
This problem affects at least Calc and Writer - I suppose the entire suite
shares the same regex engine.
It is present in both the current 7.0.0.3 and the 6.x version I used a few days
ago. (I thought my install might be borked due to this, so went to download the
latest version to reinstall. Turns out 7 had just come out.)
Additional notes/confirmation testing:
Assuming source text "abcde", these all will match:
(?<=ab)c
c(?=de)
(?<=ab)c(?=de)
But if your match is zero width (you want to add something after, before, or
between), it won't match:
(?<=ab)
(?=cd)
(?<=ab)(?=cd)
or even
^
Of course depending on the situation, this problem can be sidestepped by doing
something like "(ab)" -> "$1addthis".
Something special is going on with "end of line", in that
$
(?=$)
both work (in Calc and Writer).
In Calc, still assuming text "abcde", even
(?<=de)
works when "de" is found at the end of a cell, but not elsewhere.
My 2 systems:
Windows 10 64-bit 1909 (Windows Beta Unicode UTF-8 support enabled)
Windows 10 64-bit 2004 (Windows Beta Unicode UTF-8 support enabled/disabled;
also tried resetting profile)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>