aboutsummaryrefslogtreecommitdiff
path: root/devdoc/IO.txt
diff options
context:
space:
mode:
authorSebastian Geerken <devnull@localhost>2015-06-03 22:34:03 +0200
committerSebastian Geerken <devnull@localhost>2015-06-03 22:34:03 +0200
commitda88ede8c401449729ae9c4b78f4c9914d81fa09 (patch)
treea66a438f5523269bf470b087d1f38410a688bb1a /devdoc/IO.txt
parentd03e77b0aa3f1f70dcc1d1377b2262ad674ad87e (diff)
parent59b76c75b64578edac35d19c914067a0bd7791e9 (diff)
Merge with main repo.
Diffstat (limited to 'devdoc/IO.txt')
-rw-r--r--devdoc/IO.txt468
1 files changed, 468 insertions, 0 deletions
diff --git a/devdoc/IO.txt b/devdoc/IO.txt
new file mode 100644
index 00000000..cd62a4f5
--- /dev/null
+++ b/devdoc/IO.txt
@@ -0,0 +1,468 @@
+
+This is the updated base of a paper I wrote with Horst.
+It provides a good introduction to Dillo's internals.
+(Highly recommended if you plan to patch or develop in Dillo)
+
+It may not be exactly accurate (it's quite old), but it explains
+the theory behind in some detail, so it's more than recommended
+reading material.
+--Jcid
+
+
+-----------------------------------------------------
+Parallel network programming of the Dillo web browser
+-----------------------------------------------------
+
+ Jorge Arellano-Cid <jcid@dillo.org>
+ Horst H. von Brand <vonbrand@inf.utfsm.cl>
+
+
+--------
+Abstract
+--------
+
+ Network programs face several delay sources when sending or
+retrieving data. This is particularly problematic in programs
+which interact directly with the user, most notably web browsers.
+We present a hybrid approach using threads communicated through
+pipes and signal driven I/O, which allows a non-blocking main
+thread and overlapping waiting times.
+
+
+------------
+Introduction
+------------
+
+ The Dillo project didn't start from scratch but mainly working
+on the code base of gzilla (a light web browser written by Raph
+Levien). As the project went by, the code of the whole source was
+standardized, and the networking engine was replaced with a new,
+faster design. Later, the parser was changed, the streamed
+handling of data and its error control was put under the control
+of the CCC (Concomitant Control Chain), and the old widget system
+was replaced with a new one (Dw). The source code is currently
+regarded as "very stable beta", and is available at
+<http://www.dillo.org>. Dillo is a project licensed under the GNU
+General Public License.
+
+ This paper covers basic design aspects of the hybrid approach
+that the Dillo web browser uses to solve several latency
+problems. After introducing the main delay-sources, the main
+points of the hybrid design will be addressed.
+
+
+-------------
+Delay sources
+-------------
+
+ Network programs face several delay-sources while sending or
+retrieving data. In the particular case of a web browser, they
+are found in:
+
+ DNS querying:
+ The time required to solve a name.
+
+ Initiating the TCP connection:
+ The three way handshake of the TCP protocol.
+
+ Sending the query:
+ The time spent uploading queries to the remote server.
+
+ Retrieving data:
+ The time spent expecting and receiving the query answer.
+
+ Closing the TCP connection:
+ The four packet-sending closing sequence of the TCP protocol.
+
+ In a WAN context, every single item of this list has an
+associated delay that is non deterministic and often measured in
+seconds. If we add several connections per browsed page (each one
+requiring at least the 4 last steps), the total latency can be
+considerable.
+
+
+-----------------------------------
+The traditional (blocking) approach
+-----------------------------------
+
+ The main problems with the blocking approach are:
+
+ When issuing an operation that can't be completed
+ immediately, the process is put to sleep waiting for
+ completion, and the program doesn't do any other
+ processing in the meantime.
+
+ When waiting for a specific socket operation to complete,
+ packets that belong to other connections may be arriving,
+ and have to wait for service.
+
+ Web browsers handle many small transactions,
+ if waiting times are not overlapped
+ the latency perceived by the user can be very annoying.
+
+ If the user interface is just put to sleep during network
+ operations, the program becomes unresponsive, confusing
+ and perhaps alarming the user.
+
+ Not overlapping waiting times and processing makes
+ graphical rendering (which is arguably the central function
+ of a browser) unnecessarily slow.
+
+
+---------------------
+Dillo's hybrid design
+---------------------
+
+ Dillo uses threads and signal driven I/O extensively to
+overlap waiting times and computation. Handling the user
+interface in a thread that never blocks gives a good interactive
+``feel.'' The use of GTK+, a sophisticated widget framework for
+graphical user interfaces, helped very much to accomplish this
+goal. All the interface, rendering and I/O engine was built upon
+its facilities.
+
+ The design is said to be ``hybrid'' because it uses threads
+for DNS querying and reading local files, and signal driven I/O
+for TCP connections. The threaded DNS scheme is potentially
+concurrent (this depends on underlying hardware), while the I/O
+handling (both local files and remote connections) is
+definitively parallel.
+
+ To simplify the structure of the browser, local files are
+encapsulated into HTTP streams and presented to the rest of the
+browser as such, in exactly the same way a remote connection is
+handled. To create this illusion, a thread is launched. This
+thread opens a pipe to the browser, it then synthesizes an
+appropriate HTTP header, sends it together with the file to the
+browser proper. In this way, all the browser sees is a handle,
+the data on it can come from a remote connection or from a local
+file.
+
+ To handle a remote connection is more complex. In this case,
+the browser asks the cache manager for the URL. The name in the
+URL has to be resolved through the DNS engine, a socket TCP
+connection must be established, the HTTP request has to be sent,
+and finally the result retrieved. Each of the steps mentioned
+could give rise to errors, which have to be handled and somehow
+communicated to the rest of the program. For performance reasons,
+it is critical that responses are cached locally, so the remote
+connection doesn't directly hand over the data to the browser;
+the response is passed to the cache manager which then relays it
+to the rest of the browser. The DNS engine caches DNS responses,
+and either answers them from the cache or by querying the DNS.
+Querying is done in a separate thread, so that the rest of the
+browser isn't blocked by long waits here.
+
+ The activities mentioned do not happen strictly in the order
+stated above. It is even possible that several URLs are being
+handled at the same time, in order to overlap waiting and
+downloading. The functions called directly from the user
+interface have to return quickly to maintain interactive
+response. Sometimes they return connection handlers that haven't
+been completely set up yet. As stated, I/O is signal-driven, when
+one of the descriptors is ready for data transfer (reading or
+writing), it wakes up the I/O engine.
+
+ Data transfer between threads inside the browser is handled by
+pipes, shared memory is little used. This almost obviates the
+need for explicit synchronization, which is one of the main areas
+of complexity and bugs in concurrent programs. Dillo handles its
+threads in a way that its developers can think of it as running
+on a single thread of control. This is accomplished by making the
+DNS engine call-backs happen within the main thread, and by
+isolating file loading with pipes.
+
+ Using threads in this way has three big advantages:
+
+ The browser doesn't block when one of its child threads
+ blocks. In particular, the user interface is responsive
+ even while resolving a name or downloading a file.
+
+ Developers don't need to deal with complex concurrent
+ concerns. Concurrency is hard to handle, and few developers
+ are adept at this. This gives access a much larger pool of
+ potential developers, something which can be critical
+ in an open-source development project.
+
+ By making the code mostly sequential, debugging the code
+ with traditional tools like gdb is possible. Debugging
+ parallel programs is very hard, and appropriate tools are
+ hard to come by.
+
+ Because of simplicity and portability concerns, DNS querying
+is done in a separate thread. The standard C library doesn't
+provide a function for making DNS queries that don't block. The
+alternative is to implement a new, custom DNS querying function
+that doesn't block. This is certainly a complex task, integrating
+this mechanism into the thread structure of the program is much
+simpler.
+
+ Using a thread and a pipe to read a local file adds a
+buffering step to the process (and a certain latency), but it has
+a couple of significative advantages:
+
+ By handling local files in the same way as remote
+ connections, a significant amount of code is reused.
+
+ A preprocessing step of the file data can be added easily,
+ if needed. In fact, the file is encapsulated into an HTTP
+ data stream.
+
+
+-----------
+DNS queries
+-----------
+
+ Dillo handles DNS queries with threads, letting a child thread
+wait until the DNS server answers the request. When the answer
+arrives, a call-back function is called, and the program resumes
+what it was doing at DNS-request time. The interesting thing is
+that the call-back happens in the main thread, while the child
+thread simply exits when done. This is implemented through a
+server-channel design.
+
+
+The server channel
+------------------
+
+ There is one thread for each channel, and each channel can
+have multiple clients. When the program requests an IP address,
+the server first looks for a cached match; if it hits, the client
+call-back is invoked immediately, but if not, the client is put
+into a queue, a thread is spawned to query the DNS, and a GTK+
+idle client is set to poll the channel 5~times per second for
+completion, and when it finally succeeds, every client of that
+channel is serviced.
+
+ This scheme allows all the further processing to continue on
+the same thread it began: the main thread.
+
+
+------------------------
+Handling TCP connections
+------------------------
+
+ Establishing a TCP connection requires the well known
+three-way handshake packet-sending sequence. Depending on network
+traffic and several other issues, significant delay can occur at
+this phase.
+
+ Dillo handles the connection by a non blocking socket scheme.
+Basically, a socket file descriptor of AF_INET type is requested
+and set to non-blocking I/O. When the DNS server has resolved the
+name, the socket connection process begins by calling connect(2);
+ {We use the Unix convention of identifying the manual section
+ where the concept is described, in this case
+ section 2 (system calls).}
+which returns immediately with an EINPROGRESS error.
+
+ After the connection reaches the EINPROGRESS ``state,'' the
+socket waits in background until connection succeeds (or fails),
+when that happens, a callback function is awaked to perform the
+following steps: set the I/O engine to send the query and expect
+its answer (both in background).
+
+ The advantage of this scheme is that every required step is
+quickly done without blocking the browser. Finally, the socket
+will generate a signal whenever I/O is possible.
+
+
+----------------
+Handling queries
+----------------
+
+ In the case of a HTTP URL, queries typically translate into a
+short transmission (the HTTP query) and a lengthy retrieval
+process. Queries are not always short though, specially when
+requesting forms (all the form data is attached within the
+query), and also when requesting CGI programs.
+
+ Regardless of query length, query sending is handled in
+background. The thread that was initiated at TCP connecting time
+has all the transmission framework already set up; at this point,
+packet sending is just a matter of waiting for the
+write signal (G_IO_OUT) to come and then sending the data. When
+the socket gets ready for transmission, the data is sent using
+IO_write.
+
+
+--------------
+Receiving data
+--------------
+
+ Although conceptually similar to sending queries, retrieving
+data is very different as the data received can easily exceed the
+size of the query by many orders of magnitude (for example when
+downloading images or files). This is one of the main sources of
+latency, the retrieval can take several seconds or even minutes
+when downloading large files.
+
+ The data retrieving process for a single file, that began by
+setting up the expecting framework at TCP connecting time, simply
+waits for the read signal (G_IO_IN). When it happens, the
+low-level I/O engine gets called, the data is read into
+pre-allocated buffers and the appropriate call-backs are
+performed. Technically, whenever a G_IO_IN event is generated,
+data is received from the socket file descriptor, by using the
+IO_read function. This iterative process finishes upon EOF (or on
+an error condition).
+
+
+----------------------
+Closing the connection
+----------------------
+
+ Closing a TCP connection requires four data segments, not an
+impressive amount but twice the round trip time, which can be
+substantial. When data retrieval finishes, socket closing is
+triggered. There's nothing but a IO_close_fd call on the socket's
+file descriptor. This process was originally designed to split
+the four segment close into two partial closes, one when query
+sending is done and the other when all data is in. This scheme is
+not currently used because the write close also stops the reading
+part.
+
+
+The low-level I/O engine
+------------------------
+
+ Dillo I/O is carried out in the background. This is achieved
+by using low level file descriptors and signals. Anytime a file
+descriptor shows activity, a signal is raised and the signal
+handler takes care of the I/O.
+
+ The low-level I/O engine ("I/O engine" from here on) was
+designed as an internal abstraction layer for background file
+descriptor activity. It is intended to be used by the cache
+module only; higher level routines should ask the cache for its
+URLs. Every operation that is meant to be carried out in
+background should be handled by the I/O engine. In the case of
+TCP sockets, they are created and submitted to the I/O engine for
+any further processing.
+
+ The submitting process (client) must fill a request structure
+and let the I/O engine handle the file descriptor activity, until
+it receives a call-back for finally processing the data. This is
+better understood by examining the request structure:
+
+ typedef struct {
+ gint Key; /* Primary Key (for klist) */
+ gint Op; /* IORead | IOWrite | IOWrites */
+ gint FD; /* Current File Descriptor */
+ gint Flags; /* Flag array */
+ glong Status; /* Number of bytes read, or -errno code */
+
+ void *Buf; /* Buffer place */
+ size_t BufSize; /* Buffer length */
+ void *BufStart; /* PRIVATE: only used inside IO.c! */
+
+ void *ExtData; /* External data reference (not used by IO.c) */
+ void *Info; /* CCC Info structure for this IO */
+ GIOChannel *GioCh; /* IO channel */
+ guint watch_id; /* glib's event source id */
+ } IOData_t;
+
+ To request an I/O operation, this structure must be filled and
+passed to the I/O engine.
+
+ 'Op' and 'Buf' and 'BufSize' MUST be provided.
+
+ 'ExtData' MAY be provided.
+
+ 'Status', 'FD' and 'GioCh' are set by I/O engine internal
+routines.
+
+ When there is new data in the file descriptor, 'IO_callback'
+gets called (by glib). Only after the I/O engine finishes
+processing the data are the upper layers notified.
+
+
+The I/O engine transfer buffer
+------------------------------
+
+ The 'Buf' and 'BufSize' fields of the request structure
+provide the transfer buffer for each operation. This buffer must
+be set by the client (to increase performance by avoiding copying
+data).
+
+ On reads, the client specifies the amount and where to place
+the retrieved data; on writes, it specifies the amount and source
+of the data segment that is to be sent. Although this scheme
+increases complexity, it has proven very fast and powerful. For
+instance, when the size of a document is known in advance, a
+buffer for all the data can be allocated at once, eliminating the
+need for multiple memory reallocations. Even more, if the size is
+known and the data transfer is taking the form of multiple small
+chunks of data, the client only needs to update 'Buf' and
+BufSize' to point to the next byte in its large preallocated
+reception buffer (by adding the chunk size to 'Buf'). On the
+other hand, if the size of the transfer isn't known in advance,
+the reception buffer can remain untouched until the connection
+closes, but the client must then accomplish the usual buffer
+copying and reallocation.
+
+ The I/O engine also lets the client specify a full length
+transfer buffer when sending data. It doesn't matter (from the
+client's point of view) if the data fits in a single packet or
+not, it's the I/O engine's job to divide it into smaller chunks
+if needed and to perform the operation accordingly.
+
+
+------------------------------------------
+Handling multiple simultaneous connections
+------------------------------------------
+
+ The previous sections describe the internal work for a single
+connection, the I/O engine handles several of them in parallel.
+This is the normal downloading behavior of a web page. Normally,
+after retrieving the main document (HTML code), several
+references to other files (typically images) and sometimes even
+to other sites (mostly advertising today) are found inside the
+page. In order to parse and complete the page rendering, those
+other documents must be fetched and displayed, so it is not
+uncommon to have multiple downloading connections (every one
+requiring the whole fetching process) happening at the same time.
+
+ Even though socket activity can reach a hectic pace, the
+browser never blocks. Note also that the I/O engine is the one
+that directs the execution flow of the program by triggering a
+call-back chain whenever a file descriptor operation succeeds or
+fails.
+
+ A key point for this multiple call-back chained I/O engine is
+that every single function in the chain must be guaranteed to
+return quickly. Otherwise, the whole system blocks until it
+returns.
+
+
+-----------
+Conclusions
+-----------
+
+ Dillo is currently in very stable beta state. It already shows
+impressive performance, and its interactive ``feel'' is much
+better than that of other web browsers.
+
+ The modular structure of Dillo, and its reliance on GTK1 allow
+it to be very small. Not every feature of HTML-4.01 has been
+implemented yet, but no significant problems are foreseen in
+doing this.
+
+ The fact that Dillo's central I/O engine is written using
+advanced features of POSIX and TCP/IP networking makes its
+performance possible, but on the other hand this also means that
+only a fraction of the interested hackers are able to work on it.
+
+ A simple code base is critical when trying to attract hackers
+to work on a project like this one. Using the GTK+ framework
+helped both in creating the graphical user interface and in
+handling the concurrency inside the browser. By having threads
+communicate through pipes the need for explicit synchronization
+is almost completely eliminated, and with it most of the
+complexity of concurrent programming disappears.
+
+ A clean, strictly applied layering approach based on clear
+abstractions is vital in each programming project. A good,
+supportive framework is of much help here.
+
+