[Libreoffice-bugs] [Bug 140708] New: The REGEX function accepts all (ismx) but one (w) flags and only directly in the regular expression and does not allow all matches to be found at once

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sun Feb 28 08:11:30 UTC 2021


https://bugs.documentfoundation.org/show_bug.cgi?id=140708

            Bug ID: 140708
           Summary: The REGEX function accepts all (ismx) but one (w)
                    flags and only directly in the regular expression and
                    does not allow all matches to be found at once
           Product: LibreOffice
           Version: 7.0.4.2 release
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: medium
         Component: Calc
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: eeigor at inbox.ru

Description:
We have:
 REGEX(Text;Expression[;[Replacement][;Flags|Occurrence]])
 Flag settings: "g" only (means "Global")
Desirable:
 REGEX(Text;Expression[;[Replacement][;Flags][;Occurrence]])
 Flag settings: "g" + "ismxw"

Flag Settings - Description
i - Ignore case (case insensitive)
s - Make . match newline too (single-line, dot all)
m - Make begin/end {^, $} consider each line
x - Allow comment in regex
w - Make {\w, \W, \b, \B} follow Unicode rules

Steps to Reproduce:
See "Actual Results".

Actual Results:
1. Either the first occurrence or the given one is extracted. Now if the
replacement parameter is not specified, the flag "g" is ignored.
2. All flags (ismx) work if you insert them directly into a regular expression:
"(?ismx)…" or "(?ismx:…)" when the corresponding option is enabled. Except for
one (w).
3. Flag "w". E.g.:
=REGEX("The quick (""brown"") fox can’t jump 32.3 feet,
right?";"(?w)\b\w+\b";;5)
returns "jump", not "can't". Why?


Expected Results:
1. When the "g" flag is set, all occurrences should also be returned.
Parameters "Flags|Occurrence" should be isolated.
2. Flag settings: "g" + "ismxw"
3. Word boundaries are recognized as in the example above according to the
specification
(https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundaries).


Reproducible: Always


User Profile Reset: No



Additional Info:
The use of the "w" flag remains unclear. For example, words with an accent in a
word are recognized with the "w" flag disabled (?-w), and the examples of the
words above are not recognized at all.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20210228/0255c02e/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list