From Traditional Fault Tolerance to Blockchain. Wenbing Zhao
Чтение книги онлайн.
Читать онлайн книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao страница 25
Reconnection. A process must be able to cope with temporary connection failures and be ready to accept reconnections from other processes. This is an essential requirement for recoverable distributed system. This calls for a design in which the application logic is independent from the transport level events. This can be achieved by using a event-based [8] or document-based distributed computing architecture such as Web services [15], in conjunction with appropriate exception handling.
Message duplicate detection. As mentioned above, a process must be capable of detecting duplicate messages because it may receive such messages replayed by another process during recovery. Even though transport-level protocols such as TCP have build-in mechanism to detect and discard duplicate messages, such mechanism is irrelevant because it works only within the established connection. During failure recovery, the recovering process will inevitably re-establish the connections to other processes, hence, such mechanism cannot be depend on. Furthermore, not all application-level protocols have duplicate detection support (they often depend on the underlying transport-level protocol to do so). In this case, the application-level protocol must be modified to add the capability of message duplicate detection. For XML-based protocols, such as SOAP [15], it is straightforward to do so by introducing an additional header element that carries a <sender-id, sequence-number> tuple, where the sender-id is a unique identifier for the sending process and sequence-number is the sequence number of the message issued by the sending process. The sequence number establishes the order in which the message is sent by a process Pi to another process Pj. It must start from an initial sequence number (assigned to the first message sent) known to both processes and continuously incremented for each additional message sent without any gap. The Web Services Reliable Messaging standard [6] specifies a protocol that satisfies the above requirement.
Atomic message receiving and logging. In the protocol description, we implicitly assumed that the receiving of a message and the logging of the same message are carried out in a single atomic operation. Obviously the use of a reliable communication channel alone does not warrant such atomicity because the process may fail right after it receives a message but before it could successfully log the message, in which case, the message could be permanently lost. This issue is in fact a good demonstration of the end-to-end system design argument [17]. To ensure the atomicity of the message receiving and logging, additional application-level mechanism must be used. (Although the atomic receiving and logging can be achieved via special hardware [4], such solution is not practical for most modern systems.)
As shown in Figure 2.12(a), a reliable channel only ensures that the message sent is temporarily buffered at the sending side until an acknowledgement is received in the transport layer. The receiving side sends an acknowledgement as soon as it receives the message in the transport layer. The receiving side buffers the message received until the application process picks up the message. If the application process at the receiving side fails either before it picks up the message, or before it completes logging the message in stable storage, the sending side would receive no notification and the message sent is no longer available.
Figure 2.12 Transport level (a) and application level (b) reliable messaging.
To ensure application level reliable messaging, the sending process must store a copy of the message sent (in the application level) for possible retransmission until it receives an explicit acknowledgment message from the receiving process in the application level, as shown in Figure 2.12(b). Such an application level reliable messaging protocol does exist in some distributed computing paradigm, such as Web services [6]. Incidentally, the sender-based message logging protocol [13], to be introduced in a later subsection, incorporates a similar mechanism, albeit for a slightly different purpose.
We should note that the use of such an application level reliable messaging protocol is essential not only to ensure the atomicity of message receiving and logging, but also to facilitate the distributed system to recover from process failures (for example, the failure of the process at one end point of a transport level connection, which would cause the breakage of the connection, would have no negative impact on the process at the other end of the connection, and a process is always ready to reconnect if the current connection breaks).
Furthermore, the use of an application level reliable messaging protocol also enables the following optimization: a message received can be executed immediately and the logging of the message in stable storage can be deferred until another message is to be sent [13]. This optimization has a number of benefits, as shown in Figure 2.13:
◾ Message logging and message execution can be done concurrently (illustrated in Figure 2.13(a)), hence, minimizing the latency impact due to logging.
◾ If a process sends out a message after receiving several incoming messages, the logging of such messages can be batched in a single I/O operation (illustrated in Figure 2.13(b)), further reducing the logging latency.
Figure 2.13 Optimization of pessimistic logging: (a) concurrent message logging and execution (b) logging batched messages.
2.3.1.3 Pessimistic Logging Cost.
While much research efforts have been carried out to design optimistic and causal logging to avoid or minimize the number of logging operations (on disks) assuming that synchronous logging would incur significant latency overhead [1, 19, 20, 21] . In this section, we present some experimental results to show that such assumption is often unwarranted. The key reason is that it is easy to ensure sequential disk I/Os by using dedicated disks. It is common nowadays for magnetic disks to offer a maximum sustained data rate of 100MB or more per second. Such transfer rate is approaching or exceeding the effective bandwidth of Gigabit Ethernet networks. Furthermore, with the increasing availability (and reduced cost) of semiconductor solid state disks, the sequential disk I/Os can be made even faster and the latency for random disk I/Os can be dramatically reduced. By using multiple logging disks together with disk striping, Gigabytes per second I/Os have been reported [10].
In the experiment, a simple client-server Java program is used where the server process logs every incoming request message sent by the client and issues a response to the client. The response message is formed by transforming the client’s request and it carries the same length as the request. The server node is equipped with a 2nd generation core i5 processor running the Windows 7 Operating system. The client runs on an iMac computer in the same local area network connected by a Gigabit Ethernet switch. The server node has two hard drives, one traditional magnetic hard drive with a spindle speed of 7,200 RPM, and the other a semiconductor solid state drive. In each run, 100,000 iterations were performed. The logging latency (at the server) and the end-to-end latency (at the client) are measured.