-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathprotocol2pc.txt
More file actions
117 lines (70 loc) · 6.33 KB
/
protocol2pc.txt
File metadata and controls
117 lines (70 loc) · 6.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
Data Replication
================
Overview
--------
The data replication takes place between multiple instances of bbserv. Each instance is introduced to all others by means of the 'peers' parameter. The data replication is triggered by a WRITE or REPLACE operation of any of the bbserv instances and aims to synchronize the content of all their database files.
Design
------
Command Objects
~~~~~~~~~~~~~~~
bbserv is designed to handle arbitrary commands sent to both the 'bport' and the 'sport'. Upon receiving, the Command Identifier, which is the first word in the text message, is used to deduce the command and passed on to a builder function, which instantiates a Command Object according to the Command Identifier. The builder function is aware of all the supported Command Identifiers and when creating the appropriate Command Objects, it forwards all needed information like the whole message text, the socket stream to send its reply and the reference to the current user name.
Command Objects are designed in a polymorphic way so they can be operated in any free worker thread. They support a common interface and encapsulate all dataneeded to complete the necessary tasks.
Thread Pool
~~~~~~~~~~~
The Thread Pool is created upon bbserv's startup. It is passed on the number of worker threads as defined in the configuration (i.e. command line, configuration file) and a reference to the Connection Queue.
This way the network connection to the clients or sibling bbserv instances is completely transparent to the Thread Pool. Only the information about the socket representing the incoming client connection or the broadcast command representing the request to be sent is given.
Connection Queue
~~~~~~~~~~~~~~~~
The Connection Queue is the link between the Input Connection and the Thread Pool. It is implemented in a thread-safe manner, so it can be safely used from all the agents (i.e. worker threads).
The Connection Queue is designed in a polymorphic manner, so it can relay local sockets for incoming client connections. But it can also contain broadcast commands, which are to be sent to peers for data replication.
Input Connection
~~~~~~~~~~~~~~~~
Instances of the Input Connection are created in order to listen on both ports ('bport' and 'sport'). Each instance creates a socket for incoming requests and listens on them. This can be done in blocking or non-blocking mode. Upon the arrival of a new network connection a corresponding socket is created and put into the Connection Queue. The Thread Pool is signaled as well and so a free agent can handle this client's commands.
Each Input Connection instance is run in its own thread context, which is not part of the Thread Pool. Hence the main-thread is kept free for user input on its local console as long as it is not run as a daemon.
Broadcast Commands
~~~~~~~~~~~~~~~~~~
Broadcast Commands are used in the use case of data replication. The Coordinator creates them in order to send multiple requests at the same time. This needs to be done when the Coordinator asks the Participants for acknowledging PRECOMMIT and COMMIT operations.
The Participants' replies are handled in ACK command objects in an asynchronous way. Every positive or negative ACK reply is collected in the Acknowledge Queue, which is, similarly to the Connection Queue, a thread-safe FIFO data structure.
Upon inserting an acknowledgment, the active Coordinator is signaled, which continues waiting if there are still acknowledgments pending and if the timeout has not elapsed yet. Else it continues with the next step according to the two-phase commit procedure.
Acknowledge Queue
~~~~~~~~~~~~~~~~~
The Acknowledge Queue is the link between the worker thread that represents the Coordinator and the ACK command, which is processing one Participant's reply.
Command Objects for data replication
------------------------------------
PRECOMMIT
~~~~~~~~~
Receiver:: Participant
Trigger:: The Coordinator is processing an WRITE or REPLACE command.
Reply:: ACK command including a success indicator.
Payload:: The server-generated identifier of the WRITE/REPLACE command, which is to be done.
COMMIT
~~~~~~
Receiver:: Participant
Trigger:: All Participants have positively acknowledged PRECOMMIT.
Payload:: A copy of the WRITE or REPLACE command prefixed by the user name of the original command and the server-generated identifier of the WRITE/REPLACE command, which is to be done.
Reply:: SUCCESSFULL or UNSUCCESSFULL command let the Coordinator know the result of requested operation.
ACK
~~~
Receiver:: Coordinator
Trigger:: Peer is up an running.
Payload:: This command carries the message identifier, which has been passed through the PRECOMMIT command and "0" or "1" to indicate failure or success respectively.
SUCCESSFULL
~~~~~~~~~~~
Receiver:: Coordinator
Trigger:: Participant has processed the WRITE or REPLACE command successfully.
Payload:: The server-generated identifier of the WRITE/REPLACE command, which has been completed successfully.
UNSUCCESSFULL
~~~~~~~~~~~~~
Receiver:: Coordinator, Participant
Trigger::
* Participant failed to process the WRITE or REPLACE command.
* Coordinator has received one or more UNSUCCESSFULL commands from
Participants or awaiting a Participant's reply timed out.
Payload:: The server-generated identifier of the WRITE/REPLACE command, which has failed to complete.
Multicasting
~~~~~~~~~~~~
All of these commands need to be broadcasted from the Coordinator to all the Participants. The initiating Command Object (e.g. the WRITE command initiates PRECOMMIT to be broadcasted) will enqueue the broadcast command (e.g. PRECOMMIT) together with all the peers in the Connection Queue. Hence sending the broadcast-command (e.g. PRECOMMIT) is dispatched to Thread Pool again and processed in parallel.
In addition, a globally accessible instance of Acknowledge Queue, which is used to collect all the replies from the peers belonging to the active WRITE/REPLACE operation is setup. The initiating Command Object uses it to determine if it can process further or needs to abort/undo the current command.
Data consistency
~~~~~~~~~~~~~~~~
In order to check for data consistency during all the stages in the two-phase commit protocol, the message identifier is passed along with the PRECOMMIT and COMMIT commands. This way we can detect intercepting WRITE operations, which have a different origin than the currently ongoing data replication.