diff options
author | Sebastian Geerken <devnull@localhost> | 2015-06-01 22:00:10 +0200 |
---|---|---|
committer | Sebastian Geerken <devnull@localhost> | 2015-06-01 22:00:10 +0200 |
commit | 1463c3936ce6a57352590b901c9dbd6bc2f2086d (patch) | |
tree | 3e7983b72fe63770fd2870b57683afd9421a36bd /devdoc/IO.txt | |
parent | eb7ee4703ced8a02404eb0ebfa5b771fc5e916d5 (diff) |
Split up user and developer documentation.
Diffstat (limited to 'devdoc/IO.txt')
-rw-r--r-- | devdoc/IO.txt | 468 |
1 files changed, 468 insertions, 0 deletions
diff --git a/devdoc/IO.txt b/devdoc/IO.txt new file mode 100644 index 00000000..cd62a4f5 --- /dev/null +++ b/devdoc/IO.txt @@ -0,0 +1,468 @@ + +This is the updated base of a paper I wrote with Horst. +It provides a good introduction to Dillo's internals. +(Highly recommended if you plan to patch or develop in Dillo) + +It may not be exactly accurate (it's quite old), but it explains +the theory behind in some detail, so it's more than recommended +reading material. +--Jcid + + +----------------------------------------------------- +Parallel network programming of the Dillo web browser +----------------------------------------------------- + + Jorge Arellano-Cid <jcid@dillo.org> + Horst H. von Brand <vonbrand@inf.utfsm.cl> + + +-------- +Abstract +-------- + + Network programs face several delay sources when sending or +retrieving data. This is particularly problematic in programs +which interact directly with the user, most notably web browsers. +We present a hybrid approach using threads communicated through +pipes and signal driven I/O, which allows a non-blocking main +thread and overlapping waiting times. + + +------------ +Introduction +------------ + + The Dillo project didn't start from scratch but mainly working +on the code base of gzilla (a light web browser written by Raph +Levien). As the project went by, the code of the whole source was +standardized, and the networking engine was replaced with a new, +faster design. Later, the parser was changed, the streamed +handling of data and its error control was put under the control +of the CCC (Concomitant Control Chain), and the old widget system +was replaced with a new one (Dw). The source code is currently +regarded as "very stable beta", and is available at +<http://www.dillo.org>. Dillo is a project licensed under the GNU +General Public License. + + This paper covers basic design aspects of the hybrid approach +that the Dillo web browser uses to solve several latency +problems. After introducing the main delay-sources, the main +points of the hybrid design will be addressed. + + +------------- +Delay sources +------------- + + Network programs face several delay-sources while sending or +retrieving data. In the particular case of a web browser, they +are found in: + + DNS querying: + The time required to solve a name. + + Initiating the TCP connection: + The three way handshake of the TCP protocol. + + Sending the query: + The time spent uploading queries to the remote server. + + Retrieving data: + The time spent expecting and receiving the query answer. + + Closing the TCP connection: + The four packet-sending closing sequence of the TCP protocol. + + In a WAN context, every single item of this list has an +associated delay that is non deterministic and often measured in +seconds. If we add several connections per browsed page (each one +requiring at least the 4 last steps), the total latency can be +considerable. + + +----------------------------------- +The traditional (blocking) approach +----------------------------------- + + The main problems with the blocking approach are: + + When issuing an operation that can't be completed + immediately, the process is put to sleep waiting for + completion, and the program doesn't do any other + processing in the meantime. + + When waiting for a specific socket operation to complete, + packets that belong to other connections may be arriving, + and have to wait for service. + + Web browsers handle many small transactions, + if waiting times are not overlapped + the latency perceived by the user can be very annoying. + + If the user interface is just put to sleep during network + operations, the program becomes unresponsive, confusing + and perhaps alarming the user. + + Not overlapping waiting times and processing makes + graphical rendering (which is arguably the central function + of a browser) unnecessarily slow. + + +--------------------- +Dillo's hybrid design +--------------------- + + Dillo uses threads and signal driven I/O extensively to +overlap waiting times and computation. Handling the user +interface in a thread that never blocks gives a good interactive +``feel.'' The use of GTK+, a sophisticated widget framework for +graphical user interfaces, helped very much to accomplish this +goal. All the interface, rendering and I/O engine was built upon +its facilities. + + The design is said to be ``hybrid'' because it uses threads +for DNS querying and reading local files, and signal driven I/O +for TCP connections. The threaded DNS scheme is potentially +concurrent (this depends on underlying hardware), while the I/O +handling (both local files and remote connections) is +definitively parallel. + + To simplify the structure of the browser, local files are +encapsulated into HTTP streams and presented to the rest of the +browser as such, in exactly the same way a remote connection is +handled. To create this illusion, a thread is launched. This +thread opens a pipe to the browser, it then synthesizes an +appropriate HTTP header, sends it together with the file to the +browser proper. In this way, all the browser sees is a handle, +the data on it can come from a remote connection or from a local +file. + + To handle a remote connection is more complex. In this case, +the browser asks the cache manager for the URL. The name in the +URL has to be resolved through the DNS engine, a socket TCP +connection must be established, the HTTP request has to be sent, +and finally the result retrieved. Each of the steps mentioned +could give rise to errors, which have to be handled and somehow +communicated to the rest of the program. For performance reasons, +it is critical that responses are cached locally, so the remote +connection doesn't directly hand over the data to the browser; +the response is passed to the cache manager which then relays it +to the rest of the browser. The DNS engine caches DNS responses, +and either answers them from the cache or by querying the DNS. +Querying is done in a separate thread, so that the rest of the +browser isn't blocked by long waits here. + + The activities mentioned do not happen strictly in the order +stated above. It is even possible that several URLs are being +handled at the same time, in order to overlap waiting and +downloading. The functions called directly from the user +interface have to return quickly to maintain interactive +response. Sometimes they return connection handlers that haven't +been completely set up yet. As stated, I/O is signal-driven, when +one of the descriptors is ready for data transfer (reading or +writing), it wakes up the I/O engine. + + Data transfer between threads inside the browser is handled by +pipes, shared memory is little used. This almost obviates the +need for explicit synchronization, which is one of the main areas +of complexity and bugs in concurrent programs. Dillo handles its +threads in a way that its developers can think of it as running +on a single thread of control. This is accomplished by making the +DNS engine call-backs happen within the main thread, and by +isolating file loading with pipes. + + Using threads in this way has three big advantages: + + The browser doesn't block when one of its child threads + blocks. In particular, the user interface is responsive + even while resolving a name or downloading a file. + + Developers don't need to deal with complex concurrent + concerns. Concurrency is hard to handle, and few developers + are adept at this. This gives access a much larger pool of + potential developers, something which can be critical + in an open-source development project. + + By making the code mostly sequential, debugging the code + with traditional tools like gdb is possible. Debugging + parallel programs is very hard, and appropriate tools are + hard to come by. + + Because of simplicity and portability concerns, DNS querying +is done in a separate thread. The standard C library doesn't +provide a function for making DNS queries that don't block. The +alternative is to implement a new, custom DNS querying function +that doesn't block. This is certainly a complex task, integrating +this mechanism into the thread structure of the program is much +simpler. + + Using a thread and a pipe to read a local file adds a +buffering step to the process (and a certain latency), but it has +a couple of significative advantages: + + By handling local files in the same way as remote + connections, a significant amount of code is reused. + + A preprocessing step of the file data can be added easily, + if needed. In fact, the file is encapsulated into an HTTP + data stream. + + +----------- +DNS queries +----------- + + Dillo handles DNS queries with threads, letting a child thread +wait until the DNS server answers the request. When the answer +arrives, a call-back function is called, and the program resumes +what it was doing at DNS-request time. The interesting thing is +that the call-back happens in the main thread, while the child +thread simply exits when done. This is implemented through a +server-channel design. + + +The server channel +------------------ + + There is one thread for each channel, and each channel can +have multiple clients. When the program requests an IP address, +the server first looks for a cached match; if it hits, the client +call-back is invoked immediately, but if not, the client is put +into a queue, a thread is spawned to query the DNS, and a GTK+ +idle client is set to poll the channel 5~times per second for +completion, and when it finally succeeds, every client of that +channel is serviced. + + This scheme allows all the further processing to continue on +the same thread it began: the main thread. + + +------------------------ +Handling TCP connections +------------------------ + + Establishing a TCP connection requires the well known +three-way handshake packet-sending sequence. Depending on network +traffic and several other issues, significant delay can occur at +this phase. + + Dillo handles the connection by a non blocking socket scheme. +Basically, a socket file descriptor of AF_INET type is requested +and set to non-blocking I/O. When the DNS server has resolved the +name, the socket connection process begins by calling connect(2); + {We use the Unix convention of identifying the manual section + where the concept is described, in this case + section 2 (system calls).} +which returns immediately with an EINPROGRESS error. + + After the connection reaches the EINPROGRESS ``state,'' the +socket waits in background until connection succeeds (or fails), +when that happens, a callback function is awaked to perform the +following steps: set the I/O engine to send the query and expect +its answer (both in background). + + The advantage of this scheme is that every required step is +quickly done without blocking the browser. Finally, the socket +will generate a signal whenever I/O is possible. + + +---------------- +Handling queries +---------------- + + In the case of a HTTP URL, queries typically translate into a +short transmission (the HTTP query) and a lengthy retrieval +process. Queries are not always short though, specially when +requesting forms (all the form data is attached within the +query), and also when requesting CGI programs. + + Regardless of query length, query sending is handled in +background. The thread that was initiated at TCP connecting time +has all the transmission framework already set up; at this point, +packet sending is just a matter of waiting for the +write signal (G_IO_OUT) to come and then sending the data. When +the socket gets ready for transmission, the data is sent using +IO_write. + + +-------------- +Receiving data +-------------- + + Although conceptually similar to sending queries, retrieving +data is very different as the data received can easily exceed the +size of the query by many orders of magnitude (for example when +downloading images or files). This is one of the main sources of +latency, the retrieval can take several seconds or even minutes +when downloading large files. + + The data retrieving process for a single file, that began by +setting up the expecting framework at TCP connecting time, simply +waits for the read signal (G_IO_IN). When it happens, the +low-level I/O engine gets called, the data is read into +pre-allocated buffers and the appropriate call-backs are +performed. Technically, whenever a G_IO_IN event is generated, +data is received from the socket file descriptor, by using the +IO_read function. This iterative process finishes upon EOF (or on +an error condition). + + +---------------------- +Closing the connection +---------------------- + + Closing a TCP connection requires four data segments, not an +impressive amount but twice the round trip time, which can be +substantial. When data retrieval finishes, socket closing is +triggered. There's nothing but a IO_close_fd call on the socket's +file descriptor. This process was originally designed to split +the four segment close into two partial closes, one when query +sending is done and the other when all data is in. This scheme is +not currently used because the write close also stops the reading +part. + + +The low-level I/O engine +------------------------ + + Dillo I/O is carried out in the background. This is achieved +by using low level file descriptors and signals. Anytime a file +descriptor shows activity, a signal is raised and the signal +handler takes care of the I/O. + + The low-level I/O engine ("I/O engine" from here on) was +designed as an internal abstraction layer for background file +descriptor activity. It is intended to be used by the cache +module only; higher level routines should ask the cache for its +URLs. Every operation that is meant to be carried out in +background should be handled by the I/O engine. In the case of +TCP sockets, they are created and submitted to the I/O engine for +any further processing. + + The submitting process (client) must fill a request structure +and let the I/O engine handle the file descriptor activity, until +it receives a call-back for finally processing the data. This is +better understood by examining the request structure: + + typedef struct { + gint Key; /* Primary Key (for klist) */ + gint Op; /* IORead | IOWrite | IOWrites */ + gint FD; /* Current File Descriptor */ + gint Flags; /* Flag array */ + glong Status; /* Number of bytes read, or -errno code */ + + void *Buf; /* Buffer place */ + size_t BufSize; /* Buffer length */ + void *BufStart; /* PRIVATE: only used inside IO.c! */ + + void *ExtData; /* External data reference (not used by IO.c) */ + void *Info; /* CCC Info structure for this IO */ + GIOChannel *GioCh; /* IO channel */ + guint watch_id; /* glib's event source id */ + } IOData_t; + + To request an I/O operation, this structure must be filled and +passed to the I/O engine. + + 'Op' and 'Buf' and 'BufSize' MUST be provided. + + 'ExtData' MAY be provided. + + 'Status', 'FD' and 'GioCh' are set by I/O engine internal +routines. + + When there is new data in the file descriptor, 'IO_callback' +gets called (by glib). Only after the I/O engine finishes +processing the data are the upper layers notified. + + +The I/O engine transfer buffer +------------------------------ + + The 'Buf' and 'BufSize' fields of the request structure +provide the transfer buffer for each operation. This buffer must +be set by the client (to increase performance by avoiding copying +data). + + On reads, the client specifies the amount and where to place +the retrieved data; on writes, it specifies the amount and source +of the data segment that is to be sent. Although this scheme +increases complexity, it has proven very fast and powerful. For +instance, when the size of a document is known in advance, a +buffer for all the data can be allocated at once, eliminating the +need for multiple memory reallocations. Even more, if the size is +known and the data transfer is taking the form of multiple small +chunks of data, the client only needs to update 'Buf' and +BufSize' to point to the next byte in its large preallocated +reception buffer (by adding the chunk size to 'Buf'). On the +other hand, if the size of the transfer isn't known in advance, +the reception buffer can remain untouched until the connection +closes, but the client must then accomplish the usual buffer +copying and reallocation. + + The I/O engine also lets the client specify a full length +transfer buffer when sending data. It doesn't matter (from the +client's point of view) if the data fits in a single packet or +not, it's the I/O engine's job to divide it into smaller chunks +if needed and to perform the operation accordingly. + + +------------------------------------------ +Handling multiple simultaneous connections +------------------------------------------ + + The previous sections describe the internal work for a single +connection, the I/O engine handles several of them in parallel. +This is the normal downloading behavior of a web page. Normally, +after retrieving the main document (HTML code), several +references to other files (typically images) and sometimes even +to other sites (mostly advertising today) are found inside the +page. In order to parse and complete the page rendering, those +other documents must be fetched and displayed, so it is not +uncommon to have multiple downloading connections (every one +requiring the whole fetching process) happening at the same time. + + Even though socket activity can reach a hectic pace, the +browser never blocks. Note also that the I/O engine is the one +that directs the execution flow of the program by triggering a +call-back chain whenever a file descriptor operation succeeds or +fails. + + A key point for this multiple call-back chained I/O engine is +that every single function in the chain must be guaranteed to +return quickly. Otherwise, the whole system blocks until it +returns. + + +----------- +Conclusions +----------- + + Dillo is currently in very stable beta state. It already shows +impressive performance, and its interactive ``feel'' is much +better than that of other web browsers. + + The modular structure of Dillo, and its reliance on GTK1 allow +it to be very small. Not every feature of HTML-4.01 has been +implemented yet, but no significant problems are foreseen in +doing this. + + The fact that Dillo's central I/O engine is written using +advanced features of POSIX and TCP/IP networking makes its +performance possible, but on the other hand this also means that +only a fraction of the interested hackers are able to work on it. + + A simple code base is critical when trying to attract hackers +to work on a project like this one. Using the GTK+ framework +helped both in creating the graphical user interface and in +handling the concurrency inside the browser. By having threads +communicate through pipes the need for explicit synchronization +is almost completely eliminated, and with it most of the +complexity of concurrent programming disappears. + + A clean, strictly applied layering approach based on clear +abstractions is vital in each programming project. A good, +supportive framework is of much help here. + + |