[Libreoffice-bugs] [Bug 42893] New: [EDITING] [ProposedEasyHack] Improve Autocorrect: Capitalize first letter of sentence

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Nov 13 18:56:24 PST 2011


https://bugs.freedesktop.org/show_bug.cgi?id=42893

             Bug #: 42893
           Summary: [EDITING] [ProposedEasyHack] Improve Autocorrect:
                    Capitalize first letter of sentence
    Classification: Unclassified
           Product: LibreOffice
           Version: LibO 3.4.4 release
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: minor
          Priority: medium
         Component: Writer
        AssignedTo: libreoffice-bugs at lists.freedesktop.org
        ReportedBy: ryan.jendoubi at gmail.com


In addition to the issue identified in
https://bugs.freedesktop.org/show_bug.cgi?id=35515, there are other instances
where the Capitalize first letter of every sentence option is more trouble than
it's worth.

The 'start of sentence detection' should be improved to recognise the following
for what they are, and therefore not perform any capitalization:

1. Common contractions, e.g. "esp." for "especially", "incl." for "including",
"temp." for "temporary", "e.g.", "i.e.", etc.

2. Things which are clearly acronyms, e.g. "U.S.", "Y.M.C.A.", etc. In regex
terms I'd imagine the pattern to be /([a-zA-Z]\.){2,}/, i.e., any two or more
occurrences of a letter followed by a period.

You could make a judgment call about whether you wanted to limit it to capital
letters. On the plus side you're more likely to be looking at something really
intended as an acronym, but on the negative site I often use acronyms like
"w.r.t." for "with regard to", and suchlike. This might matter more if you
thought "e.g." and "i.e." are more accurately classed as acronyms than
contractions; I'm not sure the conceptual distinction would make a difference
here in practice.

3. Did you spot the 'intentional mistake' in number 1. above? :-) The case
where a contraction or acronym falls at the end of a sentence is tricky. Some
cursory research ([1],[2]) confirms that in these situations the correct thing
to do is to have only one period, which 'does double duty', both indicating the
shortening and ending the sentence.

Therefore, in these situations LO would probably miss the new sentence and not
be able to capitalize. However, both ending sentences with acronyms and
(hopefully) the occurrence of people forgetting to capitalize are pretty rare,
so I'd vote to suffer this possible intermittent inconvenience in order to have
the benefit which 1. and 2. above would bring.

As a pie-in-the-sky concept, I guess it'd be possible to do some heuristics
using the grammar engine to determine if the writer probably intended to finish
the sentence at a certain point, but that seems like a disproportionate amount
of effort.

Localization issues
-------------------
/[a-zA-Z]/ is Unicode-unfriendly for a start. I can't remember if LO's regex
engine supports Unicode-aware character entities like [[:alpha:]]: if it does,
we can use them; if it doesn't, that's another bug report :p

In addition, it's likely that all the rules above would have to be
language-contingent. The possible scope of this might be taking us outside the
realms of an EasyHack, but it should be possible to lay the groundwork easily
enough.


[1] http://ethnicity.rutgers.edu/~jlynch/Writing/p.html#periods
[2] http://english.stackexchange.com/search?q=[punctuation]+etc

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Libreoffice-bugs mailing list