Thanks for your reply Caolán,<div>I have submitted a bug and assigned you to it. I really appreciate you being willing to look into this!</div><div>Here's the bug url: <a href="https://www.libreoffice.org/bugzilla/show_bug.cgi?id=52020" target="_top" rel="nofollow" link="external">https://www.libreoffice.org/bugzilla/show_bug.cgi?id=52020</a><br>
Please let me know if there is anything else I can provide. I have a little working knowledge of ICU, I helped implement the breakiterator for Khmer by providing the dictionary and tests, but I am not a programmer by trade.</div>
<div><br></div><div>> There was something similar done in the past IIRC to <br>> pass around soft-page-break information so that export filters could <br>> know where the layout last put the page breaks. I forget the details of <br>
> that though.
</div><div><br></div><div>This would be a very useful feature for Cambodians (and I would assume Thai as well, although Thai tends to have more programs that currently support wordbreaking already) - would it be best to seek to do this with an extension rather than LibreOffice core?</div>
<div><br></div><div>Thanks again for your time,</div><div>Nathan</div><div><br></div><div><br><div class="gmail_quote"><span class="GingerNoCheckStart"></span>On Thu, Jul 12, 2012 at 11:10 PM, Caolán McNamara [via Document Foundation Mail Archive] <span dir="ltr"><<a href="/user/SendEmail.jtp?type=node&node=3995138&i=0" target="_top" rel="nofollow" link="external">[hidden email]</a>></span> wrote:<br>
<blockquote style='border-left:2px solid #CCCCCC;padding:0 1em' class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
On Sun, 2012-07-08 at 08:08 -0700, sungkhum wrote:
<br></div><div class="im">> I have two questions: is there a way to have the LibreOffice spelling
<br>> checker (Hunspell) also recognize word-breaks using the ICU break iterator
<br>> for Khmer so that Cambodians no longer have to add zero-width spaces
<br>> manually (as it seems to work for Thai now?)? Currently, lines without
<br>> zero-width spaces are seen as one long word to the spelling checker in
<br>> LibreOffice 3.6. But since the line-breaking is working, it would seem
<br>> breaking words for the spelling checker should also be able to work. Should
<br>> I submit a bug? How should I proceed?
<br><br></div><div class="im">Sounds like a bug really. I mean, hunspell itself generally doesn't do
<br>the parsing of text into words, the app gives each word to hunspell. And
<br>we're *supposed* to be using the icu breakiterator to split words. I
<br>suspect its a similar bug as this original one.
<br><br></div>So... sure, file a bug, assign it to me (<a href="http://user/SendEmail.jtp?type=node&node=3995127&i=0" rel="nofollow" link="external" target="_blank">[hidden email]</a>) and paste a
<br><div class="im">short two word example text into the bug and indicate where the word
<br>break should be and I'll add a regression test for it and see if its a
<br>trivial fix for Khmer too now that we're using the latest-and-greatest
<br>icu.
<br><br></div><div class="im">> Also, since many other programs do not incorporate ICU's code, is there a
<br>> way to make the line breaks "real" when a document is saved in another
<br>> format (such as a .doc?). And by "real" I mean that a zero-width space is
<br>> actually added to the text where a line-break should be.
<br><br></div><div class="im">That should at least be theoretically possible, albeit a bit tricky
<br>seeing as the layout code is the bit that knows the width of the page
<br>and does the line breaking, while the export filters don't get to know
<br>that information. There was something similar done in the past IIRC to
<br>pass around soft-page-break information so that export filters could
<br>know where the layout last put the page breaks. I forget the details of
<br>that though.
<br><br>C.
<br><br></div>_______________________________________________
<br>LibreOffice mailing list
<br><a href="http://user/SendEmail.jtp?type=node&node=3995127&i=1" rel="nofollow" link="external" target="_blank">[hidden email]</a>
<br><a href="http://lists.freedesktop.org/mailman/listinfo/libreoffice" rel="nofollow" link="external" target="_blank">http://lists.freedesktop.org/mailman/listinfo/libreoffice</a><br>
<br>
<br>
<hr noshade size="1" color="#cccccc">
<div style="color:#444;font:12px tahoma,geneva,helvetica,arial,sans-serif">
<div style="font-weight:bold">If you reply to this email, your message will be added to the discussion below:</div>
<a href="http://nabble.documentfoundation.org/Adding-Extension-for-Experimental-Thai-Spelling-tp3735637p3995127.html" target="_blank" rel="nofollow" link="external">http://nabble.documentfoundation.org/Adding-Extension-for-Experimental-Thai-Spelling-tp3735637p3995127.html</a>
</div>
<div style="color:#666;font:11px tahoma,geneva,helvetica,arial,sans-serif;margin-top:.4em;line-height:1.5em">
To unsubscribe from Adding Extension for Experimental Thai Spelling, <a href="" target="_blank" rel="nofollow" link="external">click here</a>.<br>
<a href="http://nabble.documentfoundation.org/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml" rel="nofollow" style="font:9px serif" target="_blank" link="external">NAML</a>
</div></blockquote></div><span class="GingerNoCheckEnd"></span><br></div>
<br/><hr align="left" width="300" />
View this message in context: <a href="http://nabble.documentfoundation.org/Adding-Extension-for-Experimental-Thai-Spelling-tp3735637p3995138.html">Re: Adding Extension for Experimental Thai Spelling</a><br/>
Sent from the <a href="http://nabble.documentfoundation.org/Dev-f1639786.html">Dev mailing list archive</a> at Nabble.com.<br/>