<div dir="ltr">** I will have to shape the entire paragraph * (not I will have to shape the entire sentence)<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 13, 2016 at 11:16 PM, Kelvin Ma <span dir="ltr"><<a href="mailto:kelvinsthirteen@gmail.com" target="_blank">kelvinsthirteen@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><span class="">On Mon, Jun 13, 2016 at 10:53 PM, Simon Cozens <span dir="ltr"><<a href="mailto:simon@simon-cozens.org" target="_blank">simon@simon-cozens.org</a>></span> wrote:<br></span><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>On 14/06/2016 12:42, Kelvin Ma wrote:<br> > What I need is something to bridge that gap between the 1-line of<br> > unbroken text that harfbuzz generates, and the fragments I need to be<br> > able to assemble a multi-line paragraph.<br> <br> </span>Right. You need that, but it's not Harfbuzz's job. Write some code. :-)<br> <span><br> > The only way to get these<br> > pieces is to find the spots in the shaped text where the whole line can<br> > be shaped in two pieces with an identical result.<br> <br> </span>Wrong. What you need to find is the potential line breaks. That's not a<br> shaping issue specifically; it's a text issue, and needs to be dealt<br> with at the text level. </blockquote><div><br></div></span><div>No, this is also a shaping issue and i’ll explain why.<br><br></div><div><i>Take these five sentences which I need to break into a paragraph. The shaper is always going to be involved in this. Did you only count two?</i><br><br></div><div>It has potential breakpoints here:<br><br>|Take |these |five |sen-|ten|-ces |which |I |need |to |break |into |a |para-|graph. |The |sha-|per |is |al-|ways |go-|ing |to |be |in-|vol-|ved |in |this. |Did |you |on-|ly |count |two?|<br><br></div><div>The problem is, I have no idea where, in terms of x-coordinate, any of these breakpoints are going to be until I shape them. So I will have to shape the entire sentence.<br><br></div><div>Then I find that the first glyph that overruns the width of the line is the ‘e’ in “sentences”:<br><br></div><div><i>Take these five sente</i><br><br></div><div>Now I know that I can cut this down to a correct line break by just shaping the text “<i>Take these five sen-</i>” and testing to see if that fits (with the “safe-to-break” thing, I can probably just keep the old “Take these five se” glyphs and append a newly shaped “n-”.)<br><br></div><div>The problem comes with what to do with the text that comes after the breakpoint. Without “safe-to-break” I have to reshape the *entire* remainder of the paragraph, the whole text “<i>tences which I need to break into a paragraph. The shaper is always going to be involved in this. Did you only count two?</i>”. If the paragraph is long, this can be a very long string. If I had the “safe-to-break” thing, I could find that I could keep that portion of the originally shaped line, or at worst, maybe have to reshape a “te” or something and append the old “<i>nces which I need to break into a paragraph. The shaper is always going to be involved in this. Did you only count two?</i>” to it.<br><br></div><div>The amount of text that has to be laid out is the entire length of the paragraph, PLUS <i>half the entire length of the paragraph times the number of lines</i>. That last part is crucial. With “safe-to-break” it’s just the length of the paragraph, plus a few bits and pieces of fractured text here and there.<br></div><span class=""><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Taking the example of a ligature, it *is* allowable to break (with<br> hyphenation) in the middle of a ligature like "fi". Indeed, your<br> justification engine might decide, for the good of the rest of the lines<br> in the paragraph, that this is the best place to break. If all you are<br> dealing with is the glyph output from Harfbuzz, you won't be able to<br> spot that breakpoint.<br> <br> Once you get into non-Latin scripts, things get worse. Finding<br> breakpoints is a matter that depends entirely on the rules of the script<br> or language that your text is written in. Right now I'm fighting with<br> Javanese, where line breaks are permissible at the end of syllables. You<br> need to parse the text, not the glyphs, to determine the appropriate<br> breaks. Like others have said: use ICU or similar.<br> <br> And so you need to deal with two sets of information at the same time:<br> the text-level information about breaks, and the shaper-level<br> information about glyphs. This is why Harfbuzz returns you an index into<br> your text string, so that you can keep those two sets of information in<br> sync. The hard part of writing a typesetting system is dealing with the<br> interplay between those two representations of a text.<br></blockquote><div><br></div></span><div>You are right. But I hope I explained why the shaping information has to come before the textual-breakpoint information, because without shaping, you don’t know *where* the breakpoints lie, and if you don’t know where they lie, they don’t function as breakpoints anymore.<br></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> It took me quite a while to get my head around this, and a lot of help<br> from others. You can see the record of me banging my head against this<br> particular wall at <a href="https://github.com/simoncozens/sile/issues/179" rel="noreferrer" target="_blank">https://github.com/simoncozens/sile/issues/179</a> ,<br> which has a nice explanation of the issues involved.<br> </blockquote></span></div><br></div></div> </blockquote></div><br></div>