[Libreoffice] regression tests - for word count ...

LeMoyne jlc at mail2lee.com
Mon Nov 1 23:35:37 PDT 2010

[PATCH] Fixes char overcount when selection ends in middle of word  
Patch in sw changes only one object: sw/source/core/txtedt.cxx 

Error was reported by Sophie Gautier on bug 30550

The basic fix is very simple: change the SwScanner constructor's last
argument to true (clip) from false (don't clip) so it clips any text after
the given range.

I also made additional changes at the TxtNode level: 
  --  reinstated the original count logic that was in Mattias' patch,
  --  added the chars excluding blanks accumulation to the counting of
outline numbers and bullets,
  --  flattened the logic by placing escape/no-count tests at the top, 
  --  removed a few unused/extra vars, and
  --  commented the code

Still doesn't count hidden paras, hidden or red-lined text.  Also pretty
sure it doesn't count headers and footers and may count notes: it has been a
while since I looked at those (at the Doc level).  Treats any block of
non-space characters as a word.  Counts outline numbers and bullets.  Was
tempted to rip out the bullet count (1 word, 1 char and 1 non-blank char)
but left it as pre-existing comic relief behavior.  

I tested the fix with a variety of selections including starting/ending
before/after spaces and in the middle of a word.  Count is correct for small
tests where I could hand count the chars.  Thought I had tested the
'selection ends in middle of word' case before but obviously not.  Previous
miss says more testing is still in order.  

Also tested with Andrew Pitonyak's Macro document (>64k paras) and with
Oasis Metadata Examples odt. With the Metadata Examples odt (Mattias sample
doc), the word and char counts agree between OOo 3.2 Linux and LibO as built
on Lucid with the attached patch.  With the larger test documents, both OOo
and LibO give different initial counts for all categories.  With all
documents opening from disk, the count excluding blanks is 0 (zero) until
the document is 'dirtied' by insert of a single character or empty
paragraph.  Re-opening the Word Count dialog after the change has a slight
but noticeable delay (the count time for a mostly clean document?) and the
counts change.  The counts on the Metadata Examples document stabilize after
a change and the OOo and LibO counts agree thereafter. 

On the very large Andrew Macro odt, the counts are not stable in either
version: they vary by 10s or 100s upon each insert of one or a few blank
lines.  However, closing this large document (>64k paras) also crashes LibO
and hangs OOo.  The hung OOo process shows status futex_wait_queue_me in
Ubuntu Lucid system monitor where I went to kill it.  But that is a separate
issue already reported in OOo issue db.  The point is that neither OOo nor
LibO handle large documents well - counting included.  

I will test on Word to get an 'independent' count on the larger documents.  

I suggest a 2 column tabular layout of the Word Count dialog as in gedit. 
The paragraph count is held on the same DocStat record as the word and char
counts and is already available to the dialog.  I will redo the dialog
layout if that is not already done a few days from now.  

Blessed Be! 

View this message in context: http://nabble.documentfoundation.org/PATCH-Fix-for-bug-feature-request-30550-Character-count-without-spaces-tp1778667p1826333.html
Sent from the Dev mailing list archive at Nabble.com.

More information about the LibreOffice mailing list