[Libreoffice-ux-advise] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu Feb 4 17:52:08 UTC 2021


https://bugs.documentfoundation.org/show_bug.cgi?id=91192

Stephan Bergmann <sbergman at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sbergman at redhat.com

--- Comment #17 from Stephan Bergmann <sbergman at redhat.com> ---
The code that guesses which part of a larger text shall be auto-detected as a
URI is URIHelper::FindFirstURLInText (svl/source/misc/urihelper.cxx, containing
detailed documentation).  Of necessity, it needs to apply some heuristics, and,
also of necessity, the algorithm's outcome will not necessarily match any given
user's exact expectations.  That said:

(In reply to sdc.blanco from comment #12)
> Asking for UXEval:  Two questions.
> 
> 1.  Is it a considered a "bug" a potential URL that ends with #  (or ?) does
> not include the # (or ?) in the URL recognition?
> 
> (but, as noted, no problem if text follows # or ? )

Especially with "?" (and similar to e.g. "," and "."), the heuristics
conservatively try to avoid including trailing punctuation (for which it is
assumed that it was not meant to be part of the URI).

> 2.  Is it a problem that the three characters:  ^ | \ are not recognized as
> part of a URL (and URL recognition stops with these characters)?
> 
> Relevant to note that these three characters are considered "unsafe" and
> should have percent-encoding ( https://www.ietf.org/rfc/rfc1738.txt )

That's not a "should" but a "must".  None of those three characters can appear
in a URI as-is, they always need to be percent-encoded.  The used heuristics in
general do not consider that a character that cannot appear in a URI would form
part of a to-be-detected URI.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libreoffice-ux-advise mailing list