1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.8"/>
<title>Dillo: Changes in Line-Breaking and Hyphenation</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="https://www.dillo.org/dw/html/jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">Dillo
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.8 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
</ul>
</div>
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Changes in Line-Breaking and Hyphenation </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><div style="border: 2px solid #ffff00; margin-bottom: 0.5em;
padding: 0.5em 1em; background-color: #ffffe0"><b>Info:</b> Should be incorporated into <a class="el" href="classdw_1_1Textblock.html" title="A Widget for rendering text blocks, i.e. paragraphs or sequences of paragraphs. ">dw::Textblock</a>.</div><h1>Introduction </h1>
<p>For the implementation of hyphenation in dillo, not only a hyphenation algorithm was implemented, but also, the line breaking was changed to a simple optimization per line. Aside from the improvement by this change per se, an important aspect is the introduction of "penalties". Before this change, dillo put all words into a line which fitted into it; now, a "badness" is calculated for a possible breakpoint, and the best breakpoint, i. e. the breakpoint with the smallest value for "badness", is chosen. This can be simply refined to define "good" and "bad" breakpoints by assigning a "penalty"; the best breakpoint is then the one with the smallest value of "badness +
penalty". Details can be found below.</p>
<p>Example: Normal spaces have a penalty of 0, while hyphenation points get a penalty of, say, 1, since hyphenation is generally considered as a bit "ugly" and should rather be avoided. Consider a situation where the word "dillo" could be hyphenated, with the following badnesses:</p>
<ul>
<li>before "dillo": 0.6;</li>
<li>between "dil-" and "lo": 0.2;</li>
<li>after "dillo": 0.5.</li>
</ul>
<p>Since the penalty is added, the last value is the best one, so "dillo" is put at the end of the line, without hyphenation.</p>
<p>Under other circumstances (e. g. narrower lines), the values might be different:</p>
<ul>
<li>before "dillo": infinite;</li>
<li>between "dil-" and "lo": 0.3;</li>
<li>after "dillo": 1.5.</li>
</ul>
<p>In this case, even the addition of the penalty makes hyphenation the best choice.</p>
<h1>Literature </h1>
<h2>Breaking Paragraphs Into Lines </h2>
<p>Although dillo does not (yet?) implement the algorithm T<sub>E</sub>X uses for line breaking, this document shares much of the notation used by the article <em>Breaking Paragraphs Into Lines</em> by Donald E. Knuth and Michael F. Plass; originally published in: Software – Practice and Experience <b>11</b> (1981), 1119-1184; reprinted in: <em>Digital Typography</em> by Donalt E. Knuth, CSLI Publications 1999. Anyway an interesting reading.</p>
<h2>Hyphenation </h2>
<p>Dillo uses the algorithm by Frank Liang, which is described in his doctoral dissertation found at <a href="http://www.tug.org/docs/liang/">http://www.tug.org/docs/liang/</a>. There is also a description in chapter H ("Hyphenation") of <em>The T<sub>E</sub>Xbook</em> by Donald E. Knuth, Addison-Wesley 1984.</p>
<p>Pattern files can be found at <a href="http://www.ctan.org/tex-archive/language/hyphenation">http://www.ctan.org/tex-archive/language/hyphenation</a>.</p>
<h1>Overview of Changes </h1>
<p>Starting with this change, <a class="el" href="textblock_8cc.html">dw/textblock.cc</a> has been split up; anything related to line breaking has been moved into <a class="el" href="textblock__linebreaking_8cc.html">dw/textblock_linebreaking.cc</a>. This will also be done for other aspects like floats. (Better, however, would be a clean logical split.)</p>
<p>An important change relates to the way that lines are added: before, dillo would add a line as soon as a new word for this line was added. Now, a line is added not before the <em>last</em> word of this line is known. This has two important implications:</p>
<ul>
<li>Some values in <a class="el" href="structdw_1_1Textblock_1_1Line.html">dw::Textblock::Line</a>, which represented values accumulated within the line, could be removed, since now, these values can be calculated simply in a loop.</li>
<li>On the other hand, this means that some words may not belong to any line. For this reason, in some cases (e. g. in <a class="el" href="classdw_1_1Textblock.html#adb2f79277f25d9e2bb406214ae7af83f">dw::Textblock::sizeRequestImpl</a>) <a class="el" href="classdw_1_1Textblock.html#a4c35a8ce0013873f50140813f87961a6">dw::Textblock::showMissingLines</a> is called, which creates temporary lines, which must, under other circumstances, be removed again by <a class="el" href="classdw_1_1Textblock.html#a11fae0856db072b01557bcd0a8e55d5d">dw::Textblock::removeTemporaryLines</a>, since they have been created based on limited information, and so possibly in a wrong way. (See below for details.)</li>
</ul>
<p>When a word can be hyphenated, an instance of <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a> is used for each part. Notice that soft hyphens are evaluated immediately, but automatic hyphenation is done in a lazy way (details below), so the number of instances may change. There are some new attributes: only when dw::Textblock::Word::canBeHyphenated is set to true*, automatic hyphenation is allowed; it is set to false when soft hyphens are used for a word, and (of course) by the automatic hyphenation itself. Furthermore, <a class="el" href="structdw_1_1Textblock_1_1Word.html#a43213df387f43204dee8cbbd7fdd183e">dw::Textblock::Word::hyphenWidth</a> (more details in the comment there) has to be included when calculating line widths.</p>
<p>Some values should be configurable: dw::Textblock::HYPHEN_BREAK, the penalty for hyphens. Also dw::Textblock::Word::stretchability, dw::Textblock::Word::shrinkability, which are both set in <a class="el" href="classdw_1_1Textblock.html#a1f2b341f34d9570c3082dc743a9c9441">dw::Textblock::addSpace</a>.</p>
<h1>Criteria for Line-Breaking </h1>
<p>Before these changes to line breaking, a word (represented by <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a>) had the following attributes related to line-breaking:</p>
<ul>
<li>the width of the word itself, represented by <a class="el" href="structdw_1_1Textblock_1_1Word.html#a25fa8e4fef5ae970a027ee25f1633390">dw::Textblock::Word::size</a>;</li>
<li>the width of the space following the word, represented by <a class="el" href="structdw_1_1Textblock_1_1Word.html#a9d33e37e72c7d77749e93d0671949aab">dw::Textblock::Word::origSpace</a>.</li>
</ul>
<p>In a more mathematical notation, the <img class="formulaInl" alt="$i$" src="form_0.png"/>th word has a width <img class="formulaInl" alt="$w_i$" src="form_1.png"/> and a space <img class="formulaInl" alt="$s_i$" src="form_2.png"/>.</p>
<p>A break was possible, when there was a space between the two words, and the first possible break was chosen.</p>
<p>With hyphenation, the criteria are refined. Hyphenation should only be used when otherwise line breaking results in very large spaces. We define:</p>
<ul>
<li>the badness <img class="formulaInl" alt="$\beta$" src="form_3.png"/> of a line, which is greater the more the spaces between the words differ from the ideal space;</li>
<li>a penalty <img class="formulaInl" alt="$p$" src="form_4.png"/> for any possible break point.</li>
</ul>
<p>The goal is to find those break points, where <img class="formulaInl" alt="$\beta + p$" src="form_5.png"/> is minimal.</p>
<p>Examples for the penalty <img class="formulaInl" alt="$p$" src="form_4.png"/>:</p>
<ul>
<li>0 for normal line breaks (between words);</li>
<li><img class="formulaInl" alt="$\infty$" src="form_6.png"/> to prevent a line break at all costs;</li>
<li><img class="formulaInl" alt="$-\infty$" src="form_7.png"/> to force a line</li>
<li>a positive, but finite, value for hyphenation points.</li>
</ul>
<p>So we need the following values:</p>
<ul>
<li><img class="formulaInl" alt="$w_i$" src="form_1.png"/> (the width of the word <img class="formulaInl" alt="$i$" src="form_0.png"/> itself);</li>
<li><img class="formulaInl" alt="$s_i$" src="form_2.png"/> (the width of the space following the word <img class="formulaInl" alt="$i$" src="form_0.png"/>);</li>
<li>the stretchability <img class="formulaInl" alt="$y_i$" src="form_8.png"/>, a value denoting how much the space after word <img class="formulaInl" alt="$i$" src="form_0.png"/> can be stretched (typically <img class="formulaInl" alt="${1\over 2} s_i$" src="form_9.png"/> for justified text; otherwise 0, since the spaces are not stretched);</li>
<li>the shrinkability <img class="formulaInl" alt="$y_i$" src="form_8.png"/>, a value denoting how much the space after word <img class="formulaInl" alt="$i$" src="form_0.png"/> can be shrunken (typically <img class="formulaInl" alt="${1\over 3} s_i$" src="form_10.png"/> for justified text; otherwise 0, since the spaces are not shrinked);</li>
<li>the penalty <img class="formulaInl" alt="$p_i$" src="form_11.png"/>, if the line is broken after word <img class="formulaInl" alt="$i$" src="form_0.png"/>;</li>
<li><p class="startli">a width <img class="formulaInl" alt="$h_i$" src="form_12.png"/>, which is added, when the line is broken after word <img class="formulaInl" alt="$i$" src="form_0.png"/>.</p>
<p class="startli"><img class="formulaInl" alt="$h_i$" src="form_12.png"/> is the width of the hyphen, if the word <img class="formulaInl" alt="$i$" src="form_0.png"/> is a part of the hyphenated word (except the last part); otherwise 0.</p>
</li>
</ul>
<p>Let <img class="formulaInl" alt="$l$" src="form_13.png"/> be the (ideal) width (length) of the line, which is e. at the top given by the browser window width. Furthermore, all words from <img class="formulaInl" alt="$a$" src="form_14.png"/> to <img class="formulaInl" alt="$b$" src="form_15.png"/> are added to the line. <img class="formulaInl" alt="$a$" src="form_14.png"/> is fixed: we do not modify the previous lines anymore; but our task is to find a suitable <img class="formulaInl" alt="$b$" src="form_15.png"/>.</p>
<p>We define:</p>
<p class="formulaDsp">
<img class="formulaDsp" alt="\[W_a^b = \sum_{i=a}^{b} w_i + \sum_{i=a}^{b-1} s_i + h_b\]" src="form_16.png"/>
</p>
<p class="formulaDsp">
<img class="formulaDsp" alt="\[Y_a^b = {Y_0}_a^b + \sum_{i=a}^{b-1} y_i\]" src="form_17.png"/>
</p>
<p class="formulaDsp">
<img class="formulaDsp" alt="\[Z_a^b = {Z_0}_a^b + \sum_{i=a}^{b-1} z_i\]" src="form_18.png"/>
</p>
<p><img class="formulaInl" alt="$W_a^b$" src="form_19.png"/> is the total width, <img class="formulaInl" alt="$Y_a^b$" src="form_20.png"/> the total stretchability, and <img class="formulaInl" alt="$Z_a^b$" src="form_21.png"/> the total shrinkability. <img class="formulaInl" alt="${Y_0}_a^b$" src="form_22.png"/> and <img class="formulaInl" alt="${Z_0}_a^b$" src="form_23.png"/> are the stretchability and shrinkability defined per line, and applied at the borders; they are 0 for justified text, but <img class="formulaInl" alt="${Y_0}_a^b$" src="form_22.png"/> has a positive value otherwise, see below for details.</p>
<p>Furthermore the <em>adjustment ratio</em> <img class="formulaInl" alt="$r_a^b$" src="form_24.png"/>:</p>
<ul>
<li>in the ideal case that <img class="formulaInl" alt="$W_a^b = l$" src="form_25.png"/>: <img class="formulaInl" alt="$r_a^b = 0$" src="form_26.png"/>;</li>
<li>if <img class="formulaInl" alt="$W_a^b < l$" src="form_27.png"/>: <img class="formulaInl" alt="$r_a^b = (l - W_a^b) / Y_a^b$" src="form_28.png"/> ( <img class="formulaInl" alt="$r_a^b < 0$" src="form_29.png"/> in this case);</li>
<li>if <img class="formulaInl" alt="$W_a^b > l$" src="form_30.png"/>: <img class="formulaInl" alt="$r_a^b = (l - W_a^b) / Z_a^b$" src="form_31.png"/> ( <img class="formulaInl" alt="$r_a^b < 0$" src="form_29.png"/> in this case).</li>
</ul>
<p>The badness <img class="formulaInl" alt="$\beta_a^b$" src="form_32.png"/> is defined as follows:</p>
<ul>
<li>if <img class="formulaInl" alt="$r_a^b$" src="form_24.png"/> is undefined or <img class="formulaInl" alt="$r_a^b < -1$" src="form_33.png"/>: <img class="formulaInl" alt="$\beta_a^b = \infty$" src="form_34.png"/>;</li>
<li>otherwise: <img class="formulaInl" alt="$\beta_a^b = |r_a^b|^3$" src="form_35.png"/></li>
</ul>
<p>The goal is to find the value of <img class="formulaInl" alt="$b$" src="form_15.png"/> where <img class="formulaInl" alt="$\beta_a^b + p_b$" src="form_36.png"/> is minimal. ( <img class="formulaInl" alt="$a$" src="form_14.png"/> is given, since we do not modify the previous lines.)</p>
<p>After a couple of words, it is not predictable whether this minimum has already been reached. There are two cases where this is possible for a given <img class="formulaInl" alt="$b'$" src="form_37.png"/>:</p>
<ul>
<li><img class="formulaInl" alt="$\beta_{b'}^a = \infty$" src="form_38.png"/> (line gets too tight): <img class="formulaInl" alt="$a \le b < b'$" src="form_39.png"/>, the minimum has to be searched between these two values;</li>
<li><img class="formulaInl" alt="$p_{b'} = -\infty$" src="form_40.png"/> (forced line break): <img class="formulaInl" alt="$a \le b \le b'$" src="form_41.png"/> (there may be another minimum of <img class="formulaInl" alt="$\beta_a^b$" src="form_32.png"/> before; note the <img class="formulaInl" alt="$\le$" src="form_42.png"/> instead of <img class="formulaInl" alt="$<$" src="form_43.png"/>).</li>
</ul>
<p>This leads to a problem that the last words of a text block are not displayed this way, since they do not fulfill these rules for being added to a line. For this reason, there are "temporary" lines already described above.</p>
<p>(Note that the actual calculation differs from this description, since integer arithmetic is used for performance, which make the actual code more complicated. See <a class="el" href="classdw_1_1Textblock_1_1BadnessAndPenalty.html">dw::Textblock::BadnessAndPenalty</a> for details.)</p>
<h2>Ragged Borders </h2>
<p>For other than justified text (left-, right-aligned and centered), the spaces between the words are not shrinked or stretched (so <img class="formulaInl" alt="$y_i$" src="form_8.png"/> and <img class="formulaInl" alt="$z_i$" src="form_44.png"/> are 0), but additional space is added to the left or right border or to both. For this reason, an additional stretchability <img class="formulaInl" alt="${Y_0}_a^b$" src="form_22.png"/> is added (see definition above). Since this space at the border is 0 in an ideal case ( <img class="formulaInl" alt="$W_a^b = l$" src="form_25.png"/>), it cannot be shrunken, so <img class="formulaInl" alt="${Z_0}_a^b$" src="form_23.png"/> is 0.</p>
<p>This is not equivalent to the calculation of the total stretchability as done for justified text, since in this case, the stretchability depends on the number of words: consider the typical case that all spaces and stretchabilities are equal ( <img class="formulaInl" alt="$y_a = y_{a + 1} = \ldots = y_b$" src="form_45.png"/>). With <img class="formulaInl" alt="$n$" src="form_46.png"/> words, the total strechability would be <img class="formulaInl" alt="$n \cdot y_a$" src="form_47.png"/>, so increase with an increasing number of words ( <img class="formulaInl" alt="$y_a$" src="form_48.png"/> is constant). This is correct for justified text, but for other alignments, where only one space (or two, for centered text) is changed, this would mean that a line with many narrow words is more stretchable than a line with few wide words.</p>
<p>It is obvious that left-aligned text can be handled in the same way as right-aligned text. [... Centered text? ...]</p>
<p>The default value for the stretchability is the line height without the space between the lines (more precisely: the maximum of all word heights). The exact value not so important when comparing different possible values for the badness <img class="formulaInl" alt="$\beta_a^b$" src="form_32.png"/>, when <img class="formulaInl" alt="${Y_0}_a^b$" src="form_22.png"/> is nearly constant for different <img class="formulaInl" alt="$b$" src="form_15.png"/> (which is the case for the actual value), but it is important for the comparison with penalties, which are constant. To be considered is also that for non-justified text, hyphenation is differently (less) desirable; this effect can be achieved by enlarging the stretchability, which will lead to a smaller badness, and so make hyphenation less likely. The user can configure the stretchability by changing the preference value stretchability_factor* (default: 1.0).</p>
<p>(Comparison to T<sub>E</sub>X: Knuth and Plass describe a method for ragged borders, which is effectively the same as described here (Knuth 1999, pp. 93–94). The value for the stretchability of the line is slightly less, 1 em (ibid., see also p. 72 for the definition of the units). However, this article suggests a value for the hyphenation penalty, which is ten times larger than the value for justified text; this would suggest a larger value for stretchability_factor*.)</p>
<h1>Hyphens </h1>
<p>Words (instances of <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a>), which are actually part of a hyphenated word, are always drawn as a whole, not seperately. This way, the underlying platform is able to apply kerning, ligatures, etc.</p>
<p>Calculating the width of such words causes some problems, since it is not required that the width of text "AB" is identical to the width of "A" plus the width of "B", just for the reasons mentioned above. It gets even a bit more complicated, since it is required that a word part (instance of <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a>) has always the same length, independent of whether hyphenation is applied or not. Furthermore, the hyphen length is fixed for a word; for practical reasons, it is always the width of a hyphen, in the given font.</p>
<p>For calculating the widths, consider a word of four syllables: A-B-C-D. There are 3 hyphenation points, and so 2<sup>3</sup> = 8 possible ways of hyphenation: ABCD, ABC-D, AB-CD, AB-C-D, A-BCD, A-BC-D, A-B-CD, A-B-C-D. (Some of them, like the last one, are only probable for very narrow lines.)</p>
<p>Let w(A), w(B), w(C), w(D) be the word widths (part of <a class="el" href="structdw_1_1Textblock_1_1Word.html#a25fa8e4fef5ae970a027ee25f1633390">dw::Textblock::Word::size</a>), which have to be calculated, and l be a shorthand for <a class="el" href="classdw_1_1core_1_1Platform.html#a09dc3a0148c4284719ec1cfdfc1cfb82" title="Return the width of a text, with a given length and font. ">dw::core::Platform::textWidth</a>. Without considering this problem, the calculation would be simple: w(A) = l(A) etc. However, it gets a bit more complicated. Since all non-hyphenations are drawn as a whole, the following conditions can be concluded:</p>
<ul>
<li>from drawing "ABCD" (not hyphenated at all): w(A) + w(B) + w(C) + w(D) = l(ABCD);</li>
<li>from drawing "BCD", when hyphenated as "A-BCD" ("A-" is not considered here): w(B) + w(C) + w(D) = l(BCD);</li>
<li>likewise, from drawing "CD" (cases "AB-CD" and "A-B-CD"): w(C) + w(D) = l(CD);</li>
<li>finally, for the cases "ABC-D", "AB-C-D", "A-BC-D", and "A-B-C-D": w(D) = l(D).</li>
</ul>
<p>So, the calculation is simple:</p>
<ul>
<li>w(D) = l(D)</li>
<li>w(C) = l(CD) - w(D)</li>
<li>w(B) = l(BCD) - (w(C) + w(D))</li>
<li>w(A) = l(ABCD) - (w(B) + w(C) + w(D))</li>
</ul>
<p>For calculation the hyphen widths, the exact conditions would be over-determined, even when the possibility for individual hyphen widths (instead of simply the text width of a hyphen character) would be used. However, a simple approach of fixed hyphen widths will have near-perfect results, so this is kept simple.</p>
<h1>Automatic Hyphenation </h1>
<p>When soft hyphens are used, words are immediately divided into different parts, and so different instances of <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a>. Automatic hyphenation (using Liang's algorithm) is, however, not applied always, but only when possibly needed, after calculating a line without hyphenation:</p>
<ul>
<li>When the line is tight, the last word of the line is hyphenated; possibly this will result in a line with less parts of this word, and so a less tight line.</li>
<li>When the line is loose, and there is another word (for the next line) available, this word is hyphenated; possibly, some parts of this word are taken into this line, making it less loose.</li>
</ul>
<p>After this, the line is re-calculated.</p>
<p>A problem arrises when the textblock is rewrapped, e. g. when the user changes the window width. In this case, some new instances of <a class="el" href="structdw_1_1Textblock_1_1Word.html">dw::Textblock::Word</a> must be inserted into the word list, <a class="el" href="classdw_1_1Textblock.html#a1f7c19fb947a0be347f69ebf116a4df9">dw::Textblock::words</a>. This word list is implemented as an array, which is dynamically increased; a simple approach would involve moving all of the <em>n</em> elements after position <em>i</em>, so <em>n</em> - <em>i</em> steps are necessary. This would not be a problem, since O(n) steps are necessary; however, this will be necessary again for the next hyphenated word (at the end of a following line), and so on, so that (<em>n</em> - <em>i</em><sub>1</sub>) + (<em>n</em> - <em>i</em><sub>2</sub>) + ..., with <em>i</em><sub>1</sub> < <em>i</em><sub>2</sub> < ..., which results in O(n<sup>2</sup>) steps. For this reason, the word list is managed by the class <a class="el" href="classlout_1_1misc_1_1NotSoSimpleVector.html" title="Container similar to lout::misc::SimpleVector, but some cases of insertion optimized (used for hyphen...">lout::misc::NotSoSimpleVector</a>, which uses a trick (a second array) to deal with exactly this problem. See there for more details.</p>
<h1>Tests </h1>
<p>There are test HTML files in the <em>test</em> directory. Also, there is a program testing automatic hyphenation, <em>test/liang</em>, which can be easily extended.</p>
<h1>Bugs and Things Needing Improvement </h1>
<h2>High Priority </h2>
<p>None.</p>
<h2>Medium Priority </h2>
<p>None.</p>
<h2>Low Priority </h2>
<p>Mark the end of a paragraph:** Should <a class="el" href="structdw_1_1core_1_1Content.html#a41c29111b049db05a8de25b2e1ca4bd5ac265431892ee39615e075a70f71f182f">dw::core::Content::BREAK</a> still be used? Currently, this is redundant to <a class="el" href="classdw_1_1Textblock_1_1BadnessAndPenalty.html">dw::Textblock::BadnessAndPenalty</a>.</p>
<h2>Solved (Must Be Documented) </h2>
<p>These have been solved recently and should be documented above.</p>
<p>Bugs in hyphenation:* There seem to be problems when breaking words containing hyphens already. Example: "Abtei-Stadt", which is divided into "Abtei-" and "Stadt", resulting possibly in "Abtei-<span></span>-[new line]Stadt". See also below under "Medium Priority", on how to deal with hyphens and dashes.</p>
<p>Solution:** See next.</p>
<p>Break hyphens and dashes:* The following rules seem to be relevant:</p>
<ul>
<li>In English, an em-dash is used with no spaces around. Breaking before and after the dash should be possible, perhaps with a penalty > 0. (In German, an en-dash (Halbgeviert) with spaces around is used instead.)</li>
<li>After a hyphen, which is part of a compound word, a break should be possible. As described above ("Abtei-Stadt"), this collides with hyphenation.</li>
</ul>
<p>Where to implement? In the same dynamic, lazy way like hyphenation? As part of hyphenation?</p>
<p>Notice that Liang's algorithm may behave different regarding hyphens: "Abtei-Stadt" is (using the patterns from CTAN) divided into "Abtei-" and "Stadt", but "Nordrhein-Westfalen" is divided into "Nord", "rhein-West", "fa", "len": the part containing the hyphen ("rhein-West") is untouched. (Sorry for the German words; if you have got English examples, send them me.)</p>
<p>Solution for both:** This has been implemented in <a class="el" href="classdw_1_1Textblock.html#a7a4c5d306e62cd51e2279bcb652340ad">dw::Textblock::addText</a>, in a similar way to soft hyphens. Liang's algorithm now only operates on the parts: "Abtei" and "Stadt"; "Nordrhein" and "Westfalen".</p>
<p>Hyphens in adjacent lines:* It should be simple to assign a larger penalty for hyphens, when the line before is already hyphenated. This way, hyphens in adjacent lines are penalized further.</p>
<p>Solved:** There are always two penalties. Must be documented in detail.</p>
<p>Incorrect calculation of extremes:* The minimal width of a text block (as part of the width extremes, which are mainly used for tables) is defined by everything between two possible breaks. A possible break may also be a hyphenation point; however, hyphenation points are calculated in a lazy way, when the lines are broken, and not when extremes are calculated. So, it is a matter of chance whether the calculation of the minimal width will take the two parts "dil-" and "lo" into account (when "dillo" has already been hyphenated), or only one part, "dillo" (when "dillo" has not yet been hyphenated), resulting possibly in a different value for the minimal width.</p>
<p>Possible strategies to deal with this problem:</p>
<ul>
<li>Ignore. The implications should be minimal.</li>
<li>Any solution will make it neccessary to hyphenate at least some words when calculating extremes. Since the minimal widths of all words are used to calculate the minimal width of the text block, the simplest approach will hyphenate all words. This would, of course, eliminate the performance gains of the current lazy approach.</li>
<li><p class="startli">The latter approach could be optimized in some ways. Examples: (i) If a word is already narrower than the current accumulated value for the minimal width, it makes no sense to hyphenate it. (ii) In other cases, heuristics may be used to estimate the number of syllables, the width of the widest of them etc.</p>
<p class="startli">Solved:** Hyphenated parts of a word are not considered anymore for width extremes, but only whole words. This is also one reason for the introduction of the paragraphs list.</p>
<p class="startli">Also:**</p>
</li>
<li>Configuration of penalties. </li>
</ul>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Sat May 28 2016 11:47:43 for Dillo by  <a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.8
</small></address>
</body>
</html>
|