First part of hyphenation! And some minor changes in "lout".

author: sgeerke <devnull@localhost> 2012-06-24 12:06:15 +0200
committer: sgeerke <devnull@localhost> 2012-06-24 12:06:15 +0200
commit: 39985d2c5158742fc9e00e7dd90559f9f5c8c75f (patch)
tree: 1950c633fd73c1b7ef5056ed18b9dcd527552d3b /doc
parent: e02ec1bd1405715f050a71d40a9e3c212dd43123 (diff)
4 files changed, 250 insertions, 45 deletions
diff --git a/doc/dw-line-breaking.doc b/doc/dw-line-breaking.doc
new file mode 100644
index 00000000..43d7d448
--- /dev/null
+++ b/doc/dw-line-breaking.doc
@@ -0,0 +1,205 @@
+/** \page dw-line-breaking Changes in Line-Breaking
+
+<div style="border: 2px solid #ff0000; padding: 0.5em 1em;
+background-color: #ffe0e0"><b>Warning:</b> Unsorted collection of
+notes. Should be incorporated into dw::Textblock.</div>
+
+<ul>
+<li>Motivation: opimized line breaking and introduction of penalties.</li>
+<li>Definition of word: hyphenationed word consists of multiple words.</li>
+<li>Splitting up: dw/textblock.cc and dw/textblock_linebreaking.cc</li>
+<li>Test HTML files in the <i>test</i> directory.</li>
+<li>Adjust dw::Textblock::HYPHEN_BREAK.</li>
+</ul>
+
+<h2>Literature</h2>
+
+Although dillo does not (yet?) implement the algorithm TeX uses for
+line breaking, this document shares much of the notation used by the
+article <i>Breaking Paragraphs Into Lines</i> by Donald E. Knuth and
+Michael F. Plass; originally published in: Software -- Practice and
+Experience <b>11</b> (1981), 1119-1184; reprinted in: <i>Digital
+Typography</i> by Donalt E. Knuth, CSLI Publications 1999. Anyway an
+interesting reading.
+
+
+<h2>Criteria for Line-Breaking</h2>
+
+Currently, a word (represented by dw::Textblock::Word) has the
+following attributes related to line-breaking:
+
+<ul>
+
+<li> the width of the word itself, represented by dw::Textblock::Word::size;
+<li> the width of the space following the word, represented by
+dw::Textblock::Word::origSpace.
+</ul>
+
+[TODO: When is breaking between two words not permitted?]
+
+In a more mathematical notation, the \f$i\f$th word has got a width
+\f$w_i\f$ and a space \f$s_i\f$.
+
+With hyphenation, the criteria are refined. Hyphenation should only be
+used when otherwise line breaking results in very large spaces. So, we
+define:
+
+<ul>
+<li>the badness \f$\beta\f$ of a line, which is the greater the more the
+spaces between the words differ from the ideal space;
+<li>a penalty \f$p\f$ for any possible break point.
+</ul>
+
+The goal is to find those break points, where \f$\beta + p\f$ is
+minimal.
+
+Examples for the penalty \f$p\f$:
+
+<ul>
+<li>0 for normal line breaks (between words);
+<li>\f$\infty\f$ to prevent a line break at all costs;
+<li>\f$-\infty\f$ to force a line <li>a value > 0 for hyphens.
+</ul>
+
+[TODO: which value exactly for hyphens?]
+
+So we need the following values:
+
+<ul>
+<li> \f$w_i\f$ (the width of the word \f$i\f$ itself);
+<li> \f$s_i\f$ (the width of the space following the word \f$i\f$);
+<li> the strechability \f$y_i\f$, a value denoting how much the space
+after word\f$i\f$ can be streched (typically \f${1\over 2} s_i\f$);	
+<li> the shrinkability \f$y_i\f$, a value denoting how much the space
+after word\f$i\f$ can be shrunken (typically \f${1\over 3} s_i\f$);
+<li> the penalty \f$p_i\f$, if the line is broken after word \f$i\f$;
+<li> a width \f$h_i\f$, which is added, when the line is broken after
+word \f$i\f$.
+</ul>
+
+\f$h_i\f$ is the width of the hyphen, if the word \f$i\f$ is a part of
+the hyphenated word (except the last part); otherwise 0.
+
+Let \f$l\f$ be the (ideal) width (length) of the line. Furthermore,
+all words from \f$a\f$ to \f$b\f$ are added to the line. \f$a\f$ is
+fixed: we do not modify the previous lines anymore; but our task is to
+find a suitable \f$b\f$.
+
+We define:
+
+\f[W_a^b = \sum_{i=a}^{b} w_i + \sum_{i=a}^{b-1} s_i + h_b\f]
+
+\f[Y_a^b = \sum_{i=a}^{b-1} y_i\f]
+
+\f[Z_a^b = \sum_{i=a}^{b-1} z_i\f]
+
+
+\f$W_a^b\f$ is the total width, \f$Y_a^b\f$ the total strechability, and
+\f$Z_a^b\f$ the total shrinkability.
+
+Furthermore the <i>adjustment ratio</i> \f$r_a^b\f$:
+
+<ul>
+<li>in the ideal case that \f$W_a^b = l\f$: \f$r_a^b = 0\f$;
+<li>if \f$W_a^b < l\f$: \f$r_a^b = (l - W_a^b) / Y_a^b\f$ (\f$r_a^b < 0\f$ in
+this case);
+<li>if \f$W_a^b > l\f$: \f$r_a^b = (l - W_a^b) / Z_a^b\f$ (\f$r_a^b < 0\f$ in
+this case).
+</ul>
+
+The badness \f$\beta_a^b\f$ is defined as follows:
+
+<ul>
+<li>if \f$r_a^b\f$ is undefined or \f$r_a^b < -1\f$: \f$\beta_a^b = \infty\f$;
+<li>otherwise: \f$\beta_a^b = |r_a^b|^3\f$
+</ul>
+
+The goal is to find the value of \f$b\f$ where \f$\beta_a^b + p_b\f$
+is minimal. (\f$a\f$ is given, since we do not modify the previous
+lines.)
+
+After a couple of words, it is not predictable weather this minimum
+has already been reached. There are two cases where this is possible
+for a given \f$b'\f$:
+
+<ul>
+<li>\f$\beta_{b'}^a = \infty\f$ (line gets too tight): \f$a \le b <
+b'\f$, the minimun has to be searched between these two values;
+<li>\f$p_{b'} = -\infty\f$ (forced line break): \f$a \le b \le b'\f$
+(there may be another minimum of \f$\beta_a^b\f$ before; note the
+\f$\le\f$ instead of \f$<\f$).
+</ul>
+
+This also means, that all lines must be finalized with a forced line
+break.
+
+[TODO: Change: only adding complete lines, which leads to "hanging"
+words; temporary lines.]
+
+<h2>Soft Hyphens</h2>
+
+Calculating the width causes some problems, since it is not required
+that the width of text "AB" is identical to the width of "A" plus the
+width of "B". Instead, kerning, ligatures etc. may lead to other
+results. For this reason, a word is also always drawn as a whole.
+
+[TODO: Translate the following text to English.]
+
+<pre>
+Als Beispiel Wort aus vier trennbaren Silben: A-B-C-D
+
+3 mögliche Trennstelle, daher 8 mögliche Trennungen (allerdings einige
+unwahrscheinlich):
+
+ABCD, ABC-D, AB-CD, AB-C-D, A-BCD, A-BC-D, A-B-CD, A-B-C-D,
+
+w sei Wortbreite, l das Ergebnis von textWidth. Zwingende Bedingungen
+(bezieht sich auf die Teile, wo kein Trennstrich vorkommt; bei
+Trennstrichen kann man eventuell improvisieren):
+
+ABCD                           => w(A) + w(B) + w(C) + w(D) = l(ABCD)
+A-BCD                          => w(B) + w(C) + w(D) = l(BCD)
+AB-CD, A-B-CD                  => w(C) + w(D) = l(CD)
+ABC-D, AB-C-D, A-BC-D, A-B-C-D => w(D) = l(D)
+
+Also einfache Berechnung:
+
+w(D) = l(D)
+w(C) = l(CD) - w(D)
+w(B) = l(BCD) - (w(C) + w(D))
+w(A) = l(ABCD) - (w(B) + w(C) + w(D))
+
+Bei den Breiten inklusive Trennstrich ergibt sich eine
+Überbestimmtheit (bei konsanter Trennstrichbreite pro Wort). Daher
+wird einfach eine feste Trennstrichbreite angenommen:
+
+w(A-)  = w(A) + l(-)
+w(AB-) = w(A) + w(B) + l(-)
+
+usw.
+</pre>
+
+<h2>Bugs</h2>
+
+<h3>Major</h3>
+
+<ul>
+<li>Collapsing spaces and collapsing margins do not work yet. (Are
+    collapsing spaces still needed anyway?)</li>
+<li>List items (and aligned table cells) have to be reviewed (usage of
+    dw::Textblock::line1Offset).</li>
+<li>Sometimes, lines are too wide. It seems that this difference is
+    exacly the width of a hyphen.
+</ul>
+
+<h3>Minor</h3>
+
+<ul>
+<li>Should dw::core::Content::BREAK still be used? Currently, this is
+    redundant to dw::Textblock::BadnessAndPenalty.</li>
+<li>The calculation of badness is designed for justified text. For
+    other alignments, it may be modified. (TODO: document this in
+    detail.)</li>
+</ul>
+
+*/
diff --git a/doc/index.doc b/doc/index.doc
index 9892f177..59de8cd8 100644
--- a/doc/index.doc
+++ b/doc/index.doc
@@ -23,23 +23,23 @@ GLib. For an overview on all this, take a look at \ref lout.
 GtkObject is replaced by the following:
 
 <ul>
-<li> object::Object is a common base class for many classes used dillo. In
-     the namespace ::object, there are also some more common classes and
-     interfaces.
+<li> lout::object::Object is a common base class for many classes used
+     dillo. In the namespace lout::object, there are also some more common
+     classes and interfaces.
 
-<li> A sub class of object::Object is identity::IdentifiableObject, which
-     allows to determine the class at run-time (equivalent to GTK_CHECK_CAST
-     in GtkObject).
+<li> A sub class of lout::object::Object is
+     lout::identity::IdentifiableObject, which allows to determine the
+     class at run-time (equivalent to GTK_CHECK_CAST in GtkObject).
 
-<li> For signals, there is the namespace ::signal.
+<li> For signals, there is the namespace lout::signal.
 </ul>
 
-Hash tables, linked lists etc. can be found in the ::container namespace,
+Hash tables, linked lists etc. can be found in the lout::container namespace,
 several useful macros from GLib have been implemented as inline functions
-in the ::misc namespace.
+in the lout::misc namespace.
 
 As an alternative to the macros defined in list.h, there is also a template
-class, misc::SimpleVector, which does the same.
+class, lout::misc::SimpleVector, which does the same.
 
 <h3>Changes in Dw</h3>
 
diff --git a/doc/lout.doc b/doc/lout.doc
index 0d5be679..7f00d7b8 100644
--- a/doc/lout.doc
+++ b/doc/lout.doc
@@ -6,7 +6,7 @@ overview.
 
 <h2>Common Base Class</h2>
 
-Many classes are derived from object::Object, which defines some
+Many classes are derived from lout::object::Object, which defines some
 general methods. See there for more information.
 
 For the case, that you need primitive C++ types, there are some
@@ -14,47 +14,47 @@ wrappers:
 
 <table>
 <tr><th>C++ Type         <th>Wrapper Class
-<tr><td>void*            <td>object::Pointer
-<tr><td>specific pointer <td>object::TypedPointer (template class)
-<tr><td>int              <td>object::Integer
-<tr><td>const char*      <td>object::ConstString
-<tr><td>char*            <td>object::String
+<tr><td>void*            <td>lout::object::Pointer
+<tr><td>specific pointer <td>lout::object::TypedPointer (template class)
+<tr><td>int              <td>lout::object::Integer
+<tr><td>const char*      <td>lout::object::ConstString
+<tr><td>char*            <td>lout::object::String
 </table>
 
 
 <h2>Containers</h2>
 
-In the namespace ::container, several container classes are defined,
-which all deal with instances of object::Object.
+In the namespace lout::container, several container classes are defined,
+which all deal with instances of lout::object::Object.
 
 <h3>Untyped Containers</h3>
 
-In container::untyped, there are the following containers:
+In lout::container::untyped, there are the following containers:
 
 <ul>
-<li>container::untyped::Vector, a dynamically increases array,
-<li>container::untyped::List, a linked list,
-<li>container::untyped::HashTable, a hash table, and
-<li>container::untyped::Stack, a stack.
+<li>lout::container::untyped::Vector, a dynamically increases array,
+<li>lout::container::untyped::List, a linked list,
+<li>lout::container::untyped::HashTable, a hash table, and
+<li>lout::container::untyped::Stack, a stack.
 </ul>
 
 All provide specific methods, but since they have a common base class,
-container::untyped::Collection, they all provide iterators, by the
-method container::untyped::Collection::iterator.
+lout::container::untyped::Collection, they all provide iterators, by the
+method lout::container::untyped::Collection::iterator.
 
 <h3>Typed Containers</h3>
 
-container::typed provides wrappers for the container classes defined
-in container::untyped, which are more type safe, by using C++
+lout::container::typed provides wrappers for the container classes defined
+in lout::container::untyped, which are more type safe, by using C++
 templates.
 
 
 <h2>Signals</h2>
 
 For how to connect objects at run-time (to reduce dependencies), take a
-look at the ::signal namespace.
+look at the lout::signal namespace.
 
-There is also a base class signal::ObservedObject, which implements
+There is also a base class lout::signal::ObservedObject, which implements
 signals for deletion.
 
 
@@ -67,28 +67,28 @@ see the file for mor informations.
 <h2>Identifying Classes at Runtime</h2>
 
 If the class of an object must be identified at runtime,
-identity::IdentifiableObject should be used as the base class, see
-there for more details.
+lout::identity::IdentifiableObject should be used as the base class,
+see there for more details.
 
 
 <h2>Miscellaneous</h2>
 
-The ::misc namespace provides several miscellaneous stuff:
+The lout::misc namespace provides several miscellaneous stuff:
 
 <ul>
 <li> In some contexts, it is necessary to compare objects
-     (less/greater), for this, also misc::Comparable must be
-     implemented. For example., container::untyped::Vector::sort and
-     container::typed::Vector::sort cast the elements to
-     misc::Comparable. This can be mixed with object::Object.
-<li> misc::SimpleVector, a simple, template based vector class (not
-     depending on object::Object),
-<li> misc::StringBuffer, class for fast concatenation of a large number
+     (less/greater), for this, also lout::misc::Comparable must be
+     implemented. For example., lout::container::untyped::Vector::sort and
+     lout::container::typed::Vector::sort cast the elements to
+     lout::misc::Comparable. This can be mixed with lout::object::Object.
+<li> lout::misc::SimpleVector, a simple, template based vector class (not
+     depending on lout::object::Object),
+<li> lout::misc::StringBuffer, class for fast concatenation of a large number
      of strings,
-<li> misc::BitSet implements a bitset.
-<li> useful (template) functions (misc::min, misc::max), and
-<li> some functions useful for runtime checks (misc::assert,
-     misc::assertNotReached).
+<li> lout::misc::BitSet implements a bitset.
+<li> useful (template) functions (lout::misc::min, lout::misc::max), and
+<li> some functions useful for runtime checks (lout::misc::assert,
+     lout::misc::assertNotReached).
 </ul>
 
 */
diff --git a/doc/rounding-errors.doc b/doc/rounding-errors.doc
index 433d6ed9..133a1fe5 100644
--- a/doc/rounding-errors.doc
+++ b/doc/rounding-errors.doc
@@ -15,10 +15,10 @@ implementation would result in something, for which
 because of rounding errors, due to the integer division. This can be
 avoided by transforming the formula into
 
-\f[y_i = {(\sum_{j=0}^{j=i} x_j) a \over b} - \sum_{j=0}^{j=i} y_j\f]
+\f[y_i = {(\sum_{j=0}^{j=i} x_j) a \over b} - \sum_{j=0}^{j=i-1} y_j\f]
 
 Of corse, when all \f$y_i\f$ are calculated in a sequence,
-\f$\sum_{j=0}^{j=i} x_j\f$ and \f$\sum_{j=0}^{j=i} y_j\f$ can be
+\f$\sum_{j=0}^{j=i} x_j\f$ and \f$\sum_{j=0}^{j=i-1} y_j\f$ can be
 accumulated in the same loop.
 
 */
 \ No newline at end of file
author	sgeerke <devnull@localhost>	2012-06-24 12:06:15 +0200
committer	sgeerke <devnull@localhost>	2012-06-24 12:06:15 +0200
commit	39985d2c5158742fc9e00e7dd90559f9f5c8c75f (patch)
tree	1950c633fd73c1b7ef5056ed18b9dcd527552d3b /doc
parent	e02ec1bd1405715f050a71d40a9e3c212dd43123 (diff)