<html>
    <head>
      <base href="https://bugs.documentfoundation.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_UNCONFIRMED "
   title="UNCONFIRMED - Erroneous word count (for French at least)"
   href="https://bugs.documentfoundation.org/show_bug.cgi?id=131557">131557</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Erroneous word count (for French at least)
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>LibreOffice
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>UNCONFIRMED
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Writer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>libreoffice-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>phdebar@protonmail.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Word count is wrong for French language (at least), it counts many more words
than there are. I gather that Writer counts words by counting runs of
whitespace separating them.
French typographic rules (and, hence, normal use) call for certain  common
punctuation (most notably quote marks, semicolon, colon, interrogation and
exclamation marks, dashes) to be separated from words by white space. This
wrongly inflates the word count. 

Such punctuation should not be counted as words.


So, for French language at least, a count of punctuation marks surrounded by
white space should be substracted from the actual word count. (Adding theses
punctuation marks as white space to the counting regex (I guess?) would also
mess word count with gender neutral writing, with words such as
"développeur·se", "développeur(se)", "développeur/se" or ""développeu-se".)</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>