امروز شنبه ، ۱۴۰۱/۰۴/۱۱
بدان

جدول هش توزیع شده: | Distributed hash table:

Distributed hash table:

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.

The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys.

Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data.

[1] Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption.

This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems.

History:

DHT research was originally motivated, in part, by peer-to-peer (P2P) systems such as Freenet, Gnutella, BitTorrent and Napster, which took advantage of resources distributed across the Internet to provide a single useful application.

In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file-sharing service.

[2] These systems differed in how they located the data offered by their peers.

Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results.

This central component left the system vulnerable to attacks and lawsuits.

Gnutella and similar networks moved to a query flooding model – in essence, each search would result in a message being broadcast to every other machine in the network.

While avoiding a single point of failure, this method was significantly less efficient than Napster.

Later versions of Gnutella clients moved to a dynamic querying model which vastly improved efficiency.

[3] Freenet is fully distributed, but employs a heuristic key-based routing in which each file is associated with a key, and files with similar keys tend to cluster on a similar set of nodes.

Queries are likely to be routed through the network to such a cluster without needing to visit many peers.

[4] However, Freenet does not guarantee that data will be found.

Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster.

[5] In 2001, four systems—CAN,[6] Chord,[7] Pastry, and Tapestry—ignited DHTs as a popular research topic.

A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States National Science Foundation in 2002.

[8] Researchers included Sylvia Ratnasamy, Ion Stoica, Hari Balakrishnan and Scott Shenker.

[9] Outside academia, DHT technology has been adopted as a component of BitTorrent and in the Coral Content Distribution Network.

Properties:

DHTs characteristically emphasize the following properties:

Autonomy and decentralization: the nodes collectively form the system without any central coordination.

Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.

[10] Scalability: the system should function efficiently even with thousands or millions of nodes.

A key technique used to achieve these goals is that anyone node needs to coordinate with only a few other nodes in the system – most commonly, O(log n) of the n participants (see below) – so that only a limited amount of work needs to be done for each change in membership.

Some DHT designs seek to be secure against malicious participants[11] and to allow participants to remain anonymous, though this is less common than in many other peer-to-peer (especially file sharing) systems; see anonymous P2P.

Finally, DHTs must deal with more traditional distributed systems issues such as load balancing, data integrity, and performance (in particular, ensuring that operations such as routing and data storage or retrieval complete quickly).

Structure:

The structure of a DHT can be decomposed into several main components.

[12][13] The foundation is an abstract keyspace, such as the set of 160-bit strings.

A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes.

An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace.

Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows.

Suppose the keyspace is the set of 160-bit strings.

To index a file with given filename and data in the DHT, the SHA-1 hash of filename is generated, producing a 160-bit key k, and a message put(k, data) is sent to any node participating in the DHT.

The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key k as specified by the keyspace partitioning.

That node then stores the key and the data.

Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get(k).

The message will again be routed through the overlay to the node responsible for k, which will reply with the stored data.

The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.