Due date: March 4, 2010
In this project, you will take some of the ideas (mainly transfering an object over the network) and use TCP to build a distributed hash table (DHT). With the popularity of peer-to-peer networks, such as Gnutella, researchers proposed adding structure to the peer-to-peer network so that searches would be guaranteed to find a resource if it existed on the network.
You will create a Chord-like peer-to-peer network. Chord is conceptually simple. Peers first pick a random 16-bit ID number and join the network with that ID. In the case of a collision (i.e., another peer already has it), they randomly pick a new ID. Peers are oriented in a circle (kind of like a Token-ring setup). The difference, and the reason that this type of P2P network was called Chord, is that peers also maintain a logarithmic number of connections at exponentially increasing values across the ring (hence a chord, from a geometric perspective).
The goals of this project are as follows:
select
and non-blocking sockets.Peers have the following types of messages and all messages have a source ID and destination ID when they are routed on the network. For direct messages such as ChordDetach, GetFile, and FileObject, the destination ID isn't needed:
For each of these messages, you will create a C++ class that is serialized across the network. The class IDs (recall the need for these from Project 2) should just be a 16-bit number that increases as listed above such that Join has ID 1, Get has ID 2, and so forth. The hash (in Get, for example) is also a 16-bit number computed by calculating the checksum of the file name (please note that this is not a good method for hashing on a DHT--I'm just simplifying it for the project).
For details of checksum'ing refer to your textbook or RFC 1071. I'm listing a possible method below--feel free to use as needed.
// an example of calculating the internet checksum uint16_t checksum(uint16_t vals[], size_t len) { uint32_t sum = 0; for (size_t i = 0; i < len; ++i) { sum += vals[i]; while (sum >> 16) sum = (sum & 0xffff) + (sum >> 16); } return ~sum; }
Peers take three command line arguments: the port to bind to, the address of a peer already on the system, or 0 if it is the first, and a filename indicating a file that lists a set of ficticious files in the peer-to-peer network. In other words:
peer 2222 0 <filename>This file will need to contain at least 200 randomly generated file names. When a peer first Joins, it will randomly pick 8 files and Put them on the network. After 5 seconds, it will then begin to randomly query for files by sending Get requests over the network and this will continue for 60 seconds, at which point each peer will simply close all of its connections, write their log files and exit. These actions will be logged to standard out, so that they can be redirected to a file.
When you first start the peer (unless it's the first peer), you have to specify an IP address of a peer to connect to (which we call the bootstrapping peer). It's of course okay to always specify the same peer, because the joining process will still attach it to the right place in the DHT. With this address, the peer sends a Join message, which is then routed through the network.
When the message arrives at its destination, that peer responds back with a Successor message that includes its own IP address and ID (and routed back to the bootstrapping peer). The bootstrapping peer then responds back to the joining peer with the Successor message. The joining peer closes the connection and then sends another Join, this time to the IP address taken from the Successor message.
Note that Successor messages are also sent in response to AttachChord to get the IP address of that node where the chord is to be attached.
Note that in place of an IP address, you can specify the canonical host name. However, getaddrinfo()
will
give you enough info. Note that the addrinfo structure gives you the ai_addr field, which is a struct sockaddr. This
can further be typecast into a sockaddr_in, which has a field for the IP address (the sin_addr field). This field can
then be used with inet_ntop()
, which will give you a suitable character string representation of the
IP address.
ID Join: with ID <id>, successor is ID <id> ID Get: <hash> from ID <id> ID GetResponse: <hash> at IP ID Put: <hash> from ID <id> at IP <ip address> ID AttachChord: to ID <id> ID ChordDetached: from ID <id> ID Successor: sent <id> to ID <id> ID GetFile: asking IP address <ip address> for file <filename> ID FileObject: sent file <filename> to ID <id>
Each log message is prefaced with the node ID. Note that Join is logged when you first join and you log the successor ID returned to you. AttachChord is called at increasing ID intervals and that at most a peer has 16 chords. ChordDetached is just a message when a chord socket is closed (ie, a socket to one of your chords). A peer generates a new AttachChord when this happens for this particular ID. Note that a chord is detached when the peer on the receiving side of the chord sees a new peer join close to its ID and that new peer would be closer to the ID of the original AttachChord message. Put generates a log message whenever a Put message is sent out. Each node receiving the Put, and possibly forwarding it on, also logs a Put message. Get and GetResponse similarly are logged by each node receiving them. Successor is sent in response to an AttachChord or Join message. GetFile and FileObject are sent in response to requesting and receiving a file.
The first thing to do in this project is get TCP working. You must use select and non-blocking sockets to process the TCP messages. You may use your object serialization code from the previous class for messages. Initially, you should ensure that Join, Put, and Get work, with the appropriate responses. Once this works fully, add the AttachChord-related messages.
The tricky part of this project is that your peer acts like a client and server. All TCP connections attach to well-known
ports so each peer has to call accept()
and connect()
on incoming connections. Unlike with UDP
where we just read messages off a single port, the TCP port is for connections--you still have to accept and connect them!
Adventurous undergraduates can receive extra-credit on this project by implementing a threaded version (like the graduates must do for their project).
As with the undergraduates, you should follow the same plan of action in terms of which features to implement first. However, you must implement one additional feature: threads. To specify threads, pass in an argument to your program:
peer -t 2222 r1m2cl.cs.du.edu <filename>For threading, use pthreads. An example of using pthreads is here. Make sure that your listening socket is non-blocking, but spawn a thread for each socket received from accept().
Grading will be primarily done on the correctness of your DHT. We will use the log files to help grade. 8 peers will be started sequentually. For testing, you can use the Linux lab, which will give you up to 20 machines to test from (you can start each peer by simply ssh'ing into the machines and connecting them together). Each peer should run for 1 minute as described and the output logged. Details on the breakout of grading will be posted soon.