[Grammar checker] Undocumented change in the API for LO 4
Stephan Bergmann
sbergman at redhat.com
Tue Mar 5 07:42:11 PST 2013
I have zero insight into that area of the code, but from what I gather:
GrammarCheckingIterator::GetSuggestedEndOfSentence(rText, ...) -- where
rText apparently is one single paragraph -- used to be convoluted code
that always returns rText.getLength() for the last few years, whether
that change was intentional or not. (From
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2>
"#i103496#: split svtools; improve ConfitItems" it would look odd if it
were really intentional -- why not clean the function up to a single
line then? but who knows.)
From what I understand of linguistic/source/gciterator.cxx, the two
calls to n = GrammarCheckingIterator::GetSuggestedEndOfSentence are in
two loops that each: use that n as nSuggestedBehindEndOfSentencePosition
argument to a css.linguistic2.XProofreader.doProofreading call, and then
determine whether to do further iterations of the loop based on the
returned css.linguistic2.ProofreadingResult, esp. its
nBehindEndOfSentencePosition.
Now, it beats me why anybody designed css.linguistic2.ProofreadingResult
that way, to contain all the data already passed into
css.linguistic2.XProofreader.doProofreading anyway. But could it be
that clients that observe that "[with] LibreOffice 4, each paragraph of
a text is passed several times to [doProofreading]" fail to set
nBehindEndOfSentencePosition in the css.linguistic2.ProofreadingResult
they return, to properly reflect their idea of how much they have
already consumed?
Stephan
On 03/05/2013 11:12 AM, Marcin Milkowski wrote:
> what's the supposed regression, exactly? Do we have only sentences as
> segmented by LO? This would be a serious drawback as ICU methods are
> less than perfect, and our results are much more reliable (the
> BreakIterator simply uses a static list of abbreviations which is a vast
> simplification that cannot really capture a lot of ambiguous dots, so
> it's broken by design).
>
> Best,
> Marcin
>
> On Mon, Mar 4, 2013 at 9:58 PM, Németh László <nemeth at numbertext.org
> <mailto:nemeth at numbertext.org>> wrote:
>
> Hi,
>
> If I right know, that was an intended change from the original author,
> Thomas Lange, supported by the contributors, eg. Marcin Miłkowski and
> Daniel Naber, for the real needs, better sentence boundary
> disambiguation and grammar checking by LanguageTool and other grammar
> checker components. So the recent state is a drawback. I suggest to
> revert it (maybe it would be fine to add some comments to the
> ProofreadingResult.idl to prevent from similar changes, too).
>
> Best regards,
> László
>
> 2013/3/4 Olivier R. <olivier.noreply at gmail.com
> <mailto:olivier.noreply at gmail.com>>:
> > Caolán McNamara wrote
> >> do you get the pre LO 4 behaviour ?
> >
> > Probably.
> > With LO 3, in doProofreading:
> > - nStartOfSentencePos was always the beginning of the paragraph (=0)
> > - nSuggestedSentenceEndPos was always the end of the paragraph
> (=length of
> > rText)
> >
> > And each paragraph was passed once to the GC.
> >
> >
> >
> >> Assuming that you do, then it appears to me that the current LO4
> >> behaviour is the original programmer intent and that the
> intermediate
> >> behaviour was a bug (from the programmer intent perspective
> anyway) in
> >> whatever versions got released between
> >> 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4
> >
> > Yes, we can assume that was the original programmer intent.
> > But it worked another way for 3 years and nobody complained about
> it. :)
> > I prefer the unintended behavior, as LO does not assume wrongly
> what is the
> > end of sentences.
> >
> > So what LO will do?
More information about the LibreOffice
mailing list