Читать онлайн книгу - From Traditional Fault Tolerance to Blockchain. Wenbing Zhao. Зарубежная компьютерная литература. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

From Traditional Fault Tolerance to Blockchain - Wenbing Zhao

Скачать книгу

For sender-based message logging, unfortunately this side benefit is no longer applicable. The message log is needed for the receiving processes to recover from a failure, and hence, cannot be garbage collected upon a checkpointing operation. Additional mechanism, which will be introduced towards the end of this section, is necessary to ensure that the message log does not grow indefinitely.

The reason why the history list can be garbage collected upon a checkpointing operation is because the receiving sequence number information in the list (i.e., the receiving/execution order of the messages leading to the checkpoint) will no longer be needed for failure recovery. When a process receives a duplicate message and it cannot find the corresponding receiving sequence number in the history list because it has recently checkpointed its state, it may inform the sender that the message can now be purged from its message log – it is no longer needed for failure recovery due to the recent checkpoint.

In addition to the above data structures, the protocol uses the following types of messages:

◾ REGULAR message type. It is used for sending regular messages generated by the application process, and it has the form <REGULAR,seq,rsn,m>, where m refers to the message content. Obviously, at the time of sending of a message, its receiving sequence number, rsn, would not be known to the sending process, in which case, it assumes a special constant value (such as -1) indicating the unknown status. When a logged message is replayed to a recovering process, the sending process might have already learned the rsn value, in which case, a concrete rsn value is supplied.

◾ ORDER message type. It is used for the receiving process is notify the sending process the receiving/execution order of the message. An ORDER message carries the form <ORDER, [m], rsn>, where [m] is the message identifier consisting of a tuple <sender_id, receiver_ id, seq>.

◾ ACK message type. It is used for the sending process (of a regular message) to acknowledge the receipt of the ORDER message. It assumes the form <ACK, [m]>.

2.3.2.2 Normal Operation of the Message Logging Protocol

The normal operation of the protocol is shown in Figure 2.17.

Schematic illustration of normal operation of the sender-based logging protocol.

Figure 2.17 Normal operation of the sender-based logging protocol.

The protocol operates in three steps for each message:

1 A REGULAR message, <REGULAR,seq,rsn,m>, is sent from one process, e.g., Pi, to another process, e.g., Pj.

2 Process Pj determines the receiving/execution order, rsn, of the regular message and informs the determinant information to Pi in an ORDER message <ORDER, [m], rsn>.

3 Process Pj waits until it has received the corresponding acknowledgment message, <ACK, [m]>, before it sends out any REGULAR message.

The original sender-based message logging protocol [13] was designed for use with unreliable channels. Since we have assumed the use of reliable channels, one might wonder if the third step in the protocol is still necessary. The answer is yes because transport-level reliability does not necessarily lead to application-level reliability, as we have argued in section 2.3.1.2. If a process sends the ordering message to a process and another regular message to a different process, and node on which the process runs subsequently crashes, the ordering message might not be delivered to its intended target successfully while the regular message might.

Furthermore, in the original sender-based message logging protocol [13] , the regular message and the ordering message must be retransmitted after a timeout before the expected acknowledgment message is received. With the use of reliable channels, such proactive retransmission becomes unnecessary because the only scenario in which a retransmission is necessary is when a process fails, in which case, the retransmission will be triggered by the recovery mechanism (more in section 2.3.2.3).

The use of a mature reliable communication protocol such as TCP in distributed applications is more desirable because the application developers can focus on the application logic and application-level messaging reliability without worrying about issues such as achieving high throughput and doing congestion control.

EXAMPLE 2.6

In the example shown in Figure 2.18, the distributed system consists of three processes. Both the seq counter and rsn counter are initialized to be 0, and the message log is empty at each process. Process P₀ first sends a regular message, <REGULAR,0,?,m₀>, to P₁. Upon sending the message, P₀ increments its seq counter to 1 and log the message in its volatile buffer. At this point, the rsn value for the message is unknown, hence it is denoted as a question mark.

On receiving the regular message <REGULAR,0,?,m₀>, P₁ assigns the current rsn counter value, which is 0, to this message indicating its receiving order, increments its rsn counter to 1, and sends P₀ an ORDER message <ORDER,[m₀],0>. When P₀ receives this ORDER message, it updates the entry in its message log to reflect the ordering number for message m₀, and sends an sc ack message, <ACK,[m₀]>, to P₁.

Once receiving the ACK message, P₁ is permitted to send a regular message, <REGULAR,0,?,m₁>, to P₂. The handling of the message and the corresponding ORDER and ACK messages are similar to the previous ones.

Schematic illustration of an example normal operation of the sender-based logging protocol.

Figure 2.18 An example normal operation of the sender-based logging protocol.

Subsequently, P₀ and P₂ send three regular messages m₂, m₃, m₄, nearly concurrently to P₀. P₁ assigns 1 as the rsn value for the first of the three messages (for m₂) and sends an ordering message to P₀, and assigns 2 and 3 for the two back-to-back regular messages (for m₃ and m₄) from P₂. For the two messages from P₂, P₁ can batch the ORDER messages and sends them together to P₂, and P₂ can batch the corresponding the ACK messages to P₁ too. Upon receiving the ACK messages for all three ORDER messages, P₁ sends another regular message containing m₅ with sequence number 1, updates the seq counter to 2, and log the message.

2.3.2.3 Recovery Mechanism.

On recovering from a failure, a process first restores its state using the latest local checkpoint, and then it must broadcast a request to all other processes in the system to retransmit all their logged messages that were sent to the process.

Because

Скачать книгу

From Traditional Fault Tolerance to Blockchain. Wenbing Zhao

Чтение книги онлайн.

Читать онлайн книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao страница 27

Информация о книге:

2.3.2.2 Normal Operation of the Message Logging Protocol

EXAMPLE 2.6

2.3.2.3 Recovery Mechanism.