Introduction
What is Peer-to-peer?
This page walks through a basic introduction to developing peer-to-peer (P2P) applications. By the end of it, you should understand the concepts and programming constructs necessary to implement a P2P protocol and/or application program.
The first thing we need is to understand exactly what is meant by "P2P". This page, What Is P2P... And What Isn't, written by Clay Shirky in 2000, gives a nice overview of the fundamental essence of P2P, along with a little history. Briefly, P2P applications organize and manage resources at the edges of the Internet in a decentralized manner, with little or no interaction with centralized servers. The resources may be storage or content (e.g. file sharing applications), processor cycles (e.g. the SETI framework or other programs which use your computer to perform part of some large, distributed task when it is idle), or human presence (e.g. instant messaging). The 'edges' of the Internet are machines such as the average home PC which has just a single line (dial-up, DSL, cable, etc.) to connect it to the vast Internet 'cloud.'
In a traditional client/server setup, your home PC acts as a client and sends requests to machines-- servers-- somewhere inside the cloud of the Internet to get things done: browsing the web or getting email. In a P2P paradigm, on the other hand, your home PC (for example) may connect, through the Internet, directly to tens or hundreds of other home PCs (or other machines at the 'edge') in order to share information and data. Because such resources at the edge of the Internet tend to be ephemeral (they may connect and disconnect many times repeatedly in a day), P2P protocols have to operate in an "environment of unstable connectivity and unpredictable IP addresses" [Shirky].
P2P vs. Client/Server
So, unlike a server/client architecture where you develop applications in two asymmetrical pieces -- the server, which provides services and is assumed to be reliably available at a known Internet address, and the client which connects to the server in order to request information -- P2P applications seem a bit more tricky to develop. In a P2P system, all machines (nodes) are running the same program (this is somewhat of a generalization: some systems are organized so groups of P2P nodes are running similar but different programs). Some of the issues that need to be handled in such a situation include:
- Connectivity: how to find and connect other P2P nodes that are running in the network (unlike traditional servers, they don't have a fixed, known IP address)
- Instability: nodes may always be joining and leaving the network (unlike servers-- web, email, etc., which we usually depend on to "be there")
- Message routing: how messages should be routed to get from one node to another (where the two nodes may not directly know about each other)
- Searching (somewhat related to routing): how to find desired information from the nodes connected to the network
- Security: a whole slew of issues including nodes being able to trust other nodes, preventing malicious nodes from doing bad things to the P2P network or the individual nodes, being able to send and receive data anonymously, etc.
The tutorial in this document will primarily focus on the first four issues -- the basics of getting a P2P system up and running. By the time you reach the end of the tutorial, you will be familiar with a library that will help you implement P2P protocols and applications. The library provides infrastructure-related routines to help you manage issues of socket handling, threading, and sending messages between peers. To help you understand what is going on, the main part of this tutorial is a walk through the development of the library itself.
^ TOPA P2P Framework
Overview
Before diving into the details of actual code, you should understand what we are trying to implement at a high-level. The figure below illustrates how conversations happen between peers in a network. Each application running on a node provides an interface to the user (you and I), and is simultaneously running a "main loop" that listens for incoming connections from other peers.
In the figure, a scenerio is diagrammed where the user on Peer 1 clicks a button, for example, a "Search" button, in the GUI interface. The interface somehow decides to send a "Query" message to another peer, in this case, Peer 2. The main loop of Peer 2 detects the incoming connection request (step 2) and starts up a separate thread to handle the actual data of the request (step 3). (A thread is a task that a program runs simultaneously, or pseudo-simultaneously, with other running tasks: see Thread - Wikipedia if you are not familiar with the term. The purpose of using threads here is to allow a peer to handle multiple incoming connections simultaneous.)
Assuming, message type "c" refers to a "Query" message, Peer 1 sends the actual message (step 4) once it has gotten a connection to Peer 2. In step 5, the "handle peer" task (thread) of Peer 2 receives the message, sends an acknowledgment back to Peer 1, closes the connection, and then calls an appropriate function/method to handle the message based on its type.
After processing the message, the "msg c handler" function decides that it needs to send a "Query Response" message back to Peer 1, so it attempts to connect (step 6). Peer 1's main loop, listening for such connections, accepts the connection and starts its separate handler thread (step 7) to receive the actual message data from Peer 2 (step 8). Having received the message, Peer 1 does what Peer 2 did in step 5, and the process continues...
Components of a Peer
The implementation of the P2P protocol that each peer is running in the diagram above is built on a simple framework that you will be familiar with in detail by the end of this document. Here, I will give a high-level overview of the various modules (i.e. classes, in object-oriented terminology) that make up the framework. Note, that the framework does not include anything to do with the user interface-- that part of the program would be implemented seperately and would interact in appropriate ways with the underlying framework described here.
The Peer module
The Peer module manages the overall operations of a single node in the P2P network. It contains a main loop that listens for incoming connections and creates separate threads to handle them. The programmer, building a P2P protocol on top of this generic framework, would registers handlers (i.e. methods or functions) with the Peer module for various message types, and the main loop would dispatch incoming requests to the appropriate handler. The Peer is initialized by providing a port to listen for incoming connections, and optionally a host address (i.e. IP address, which may be automatically determined) and node identifier.
A list of known peers, which may be accessed and modified by the programmer, is also maintained by the Peer module. The size of the list may be limited, and peers may be accessed using their identifiers or their sequential position in the list. Besides a list of handlers for various method types, the node also stores a programmer-supplied function for deciding how to route messages, and can be set up to run stabilization operations at specific intervals.
The PeerConnection module
The PeerConnection module encapsulates a socket connected to a peer node. The framework currently uses TCP/IP sockets for communication between nodes. A PeerConnection object provides methods the make it easy for the programmer to send and receive messages and acknowledgments in the P2P algorithm. It ensures messages are encoded in the correct format and attempts to detect various error conditions.
Messages exchanged between nodes built on this framework are prefixed by a header composed of a 4-byte identifier for the type of the message and a 4-byte integer holding the size of the data in the message. The 4-byte message code can be viewed as a string, so the programmer may come up with appropriate strings of length 4 to identify the various types of messages exchanged in the system. When the main loop of a peer receives a message, it dispatches it to the appropriate handler based on the message type.
A message handler is simply a function object in Python (or an object supporting the handler interface in Java) that receives a reference to an open PeerConnection and the message data. Handlers can be registered for any message type identified by a 4-byte string. Currently only one handler per type may be used. When the Peer module receives an incoming connection request, it sets up a PeerConnection object, reads in the message type and remainder of the message, and launches a separate thread to handle the data. The peer connection is automatically closed when the message handler completes its task.
^ TOPFurther Reading
From here, you may wish to continue reading about one (or more) of the following topics:
- Framework Implementation - Python
A Python implementation of the underlying P2P framework described above.
- Framework Implementation - Java
A Java implementation of the underlying P2P framework described above.
- File Sharing Application - Python
An example of how to use the framework to implement a simple P2P protocol and GUI (in Python).
- File Sharing Application - Java
An example of how to use the framework to implement a simple P2P protocol and GUI (in Java).