? How Dat Works (archive)

Source originale du contenu

Dat is a protocol for sharing data between computers. Dat’s strengths are that data is hosted and distributed by many computers on the network, that it can work offline or with poor connectivity, that the original uploader can add or modify data while keeping a full history and that it can handle large amounts of data.

Dat is compelling because the people working on it have a dedication to user experience and ease-of-use. The software around Dat brings publishing within reach for people with a wide range of skills, not just technical. Although first designed with scientific data in mind, the Dat community is testing the waters and has begun to use it for websites, art, music releases, peer-to-peer chat programs and many other experiments.

This guide is an in-depth tour through the bits and bytes of the Dat protocol, starting from a blank slate and ending with being able to download and share files with other peers running Dat. There will be enough detail for readers who are considering writing their own implementation of Dat, but if you are just curious how it works or want to learn from Dat’s design then I hope you will find this guide useful too!

URLs

To fetch a file in Dat you need to know its URL. Here is an example:

dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639/dat_intro.gif protocolidentifier ed25519 public key(hexadecimal) optional suffixpath to data within Dat

Discovery

Dat clients use several different methods for discovering peers who they can download data from. Each discovery method has strengths and weaknesses, but combined they form a reasonably robust way of finding peers.

Discovery keys

Discovery keys are used for finding other peers who are interested in the same Dat as you.

If you know a Dat’s public key then you can calculate the discovery key easily, however if you only know a discovery key you cannot work backwards to find the corresponding public key. This prevents eavesdroppers learning of Dat URLs (and therefore being able to read their contents) by observing network traffic.

However eavesdroppers can confirm that peers are talking about a specific Dat and read all communications between those peers if they know its public key already. Eavesdroppers who do not know the public key can still get an idea of how many Dats are popular on the network, their approximate sizes, which IP addresses are interested in them and potentially the IP address of the creator by observing handshakes, traffic timing and volumes. Dat makes no attempt to hide IP addresses.

Calculate a Dat’s discovery key using the BLAKE2b hashing function, keyed with the public key (as 32 bytes, not 64 hexadecimal characters), to hash the word “hypercore”:

Local network discovery

Peers broadcast which Dats they are interested in via their local network.

  • Strengths. Fast, finds physically nearby peers, doesn’t need special infrastructure, works offline.
  • Weaknesses. Limited reach.
  • Deployment status. Currently in use, will be replaced by Hyperswarm in the future.

Local network discovery uses multicast DNS, which is like a regular DNS query except instead of sending queries to a nameserver they are broadcast to the local network with the hope that someone else on the network sees it and responds.

Client asking for peers

Multicast DNS packets are sent to the special broadcast MAC and IP addresses shown above. Both the source and destination ports are 5353.

Essentially the computer is asking “Does anybody have any TXT records for the domain name 25a78aa81615847eba00995df29dd41d7ee30f3b.dat.local?” Other Dat clients on the network will recognize requests following this pattern and know that the client who sent it is looking for peers.

Peer reporting that they are also interested in this Dat

Responses contain two TXT records:

The special IP address 0.0.0.0 means “use the address this mDNS response came from”. When discovering peers on the local network all mDNS responses will contain only one peer and will use the 0.0.0.0 address.

Centralized DNS discovery

Peers ask a server on the internet for other peers using a DNS-based protocol.

Currently the server running this is discovery1.datprotocol.com. If that goes offline then discovery2.datprotocol.com can be used as a fallback.

Here is a typical message flow between a Dat peer and the DNS discovery server:

To stay subscribed, peers should re-announce themselves every 60 seconds. The discovery server will also cycle its tokens periodically so peers should remember the token they last received and update it when the receive a new one.

The peers record returned by the discovery server uses the same structure as in mDNS:

Following are three examples showing how these DNS requests appear as bytes sent over the network:

Peer announce request to discovery server
Discovery server response to announce
Discovery server SRV push notification

Wire protocol

Once a peer has discovered another peer’s IP address and port number it will open a TCP connection to the other peer. Each half of the conversation has this structure which repeats until the end of the connection:

Varints

The first two fields are encoded as variable-length integers and therefore do not have a fixed size. You must read each field starting from the beginning to determine how long the field is and where the next field starts.

The advantage of varints is that they only require a few bytes to represent small numbers, while still being able to represent large numbers by using more bytes. The disadvantage of varints is that they take more work to encode and decode compared to regular integers.

Here’s how to decode a varint:

Keepalive

Keepalive messages are empty messages containing no channel number, type or body. They are discarded upon being received. Sending keepalives is necessary when there is a network middlebox that kills TCP connections which haven’t sent any data in a while. In these cases each peer periodically sends keepalive messages when no other data is being sent.

Here’s an example of several keepalive messages interleaved with messages containing actual data. Each keepalive message is a single byte of zero:

Message structure

Within each message body is a series of field tags and values:

The field tag is a varint. The most significant bits indicate which field within the message this is, for example: 1 = discovery key, 2 = nonce. This is needed because messages can have missing or repeated fields. The 3 least significant bits are the type of field.

The two types of field are:

Feed message

After opening the TCP connection the first message is always a feed message.

Feed messages have two fields:

No. Name Type Description
1 Discovery key Length-prefixed 32-byte discovery key for this Dat.
2 Nonce Length-prefixed 24-byte random nonce generated for this TCP connection. Only present for the first feed message.

Putting everything together, this is how each side of the TCP connection begins:

Or, as the bytes actually sent over the wire:

Encryption

Each side of the TCP connection is encrypted starting from the second message and continuing until the end of the connection. This prevents network eavesdroppers from finding out what data a Dat contains unless they already know its public key.

XSalsa20 is the encryption cipher used. Given a 32-byte key and a 24-byte nonce, XSalsa20 produces a never-ending stream of pseudorandom bytes called the keystream.

The sender generates a random 24-byte value for the nonce and includes it in their first message (which is always a feed message). The Dat’s 32-byte public key is used as the XSalsa20 key. From the second message onwards all bytes they send are XORed with the keystream.

The receiver reads the nonce from the sender’s first message and uses this with their knowledge of the Dat’s public key to set up an identical XSalsa20 keystream. Then they XOR the keystream with the bytes received to decrypt the stream.

Handshake message

After the initial feed message, the second message sent on each side of the TCP connection is always a handshake message.

No. Name Type Description
1 ID Length-prefixed Random ID generated by this peer upon starting, 32 bytes long. Used to detect multiple connections to the same peer or accidental connections to itself.
2 Live Varint 0 = End the connection when neither peer is downloading, 1 = Keep the connection open indefinitely (only takes effect if both peers set to 1).
3 User data Length-prefixed Arbitrary bytes that can be used by higher-level applications for any purpose. Remember that any data put in here is not protected from tampering as it passes through the network. Additionally, any eavesdropper who knows the Dat’s public key can read this.
4 Extensions Length-prefixed Names of any extensions the peer wants to use, for example: “session-data”. This field can appear multiple times, one for each extension. Both peers need to request an extension in their handshake messages for it to become active.
5 Acknowledge Varint 0 = No need to acknowledge each chunk of data received, 1 = Must acknowledge each chunk of data received.

Data model

After completing the handshake peers begin requesting data from each other. Dats contain a list of variable-sized chunks of bytes. New chunks can be added to the end by the Dat’s author, but existing chunks can’t be deleted or modified.

Hashes are used to verify the integrity of data within a Dat. Each chunk of data has a corresponding hash. There are also parent hashes which verify the integrity of two other hashes. Parent hashes form a tree structure. In this example the integrity of all the data can be verified if you know hash number 3:

Each time the author adds new chunks they calculate a root hash and sign it with the Dat’s secret key. Downloaders can use the Dat’s public key to verify the signature, which in turn verifies the integrity of all the other hashes and chunks.

Depending on the number of chunks, the root hash can have more than one input. The root hash combines as many parent or chunk hashes as necessary to cover all the chunks. Here is how the hash tree looks with different numbers of chunks:

Hashes and signatures

The three types of hash seen in the hash tree are:

  • Chunk hashes, which hash the contents of a single chunk.
  • Parent hashes, which hash two other hashes forming a tree structure.
  • Root hashes, which sit at the root of the tree and are signed by the Dat’s author.

Each type of hash has a specific way to construct it:

Exchanging data

Peers exchange chunks in multi-step process where the downloader and uploader negotiate what chunks they want and have:

Want and have

Each peer remembers which chunks the other peer wants and has.

As peers download (or even delete) data, the list of chunks they want and have will change. This state is communicated with four message types: want, unwant, have and unhave. Each of these four messages has the same structure which indicates a contiguous range of chunks:

No. Name Type Description
1 Start Varint Number of the first chunk you want/unwant/have/unhave. Chunk numbering starts at 0.
2 Length Varint 1 = Just the start chunk, 2 = The start chunk and the next one, and so on. Omit this field to select all following chunks to the end of the Dat, including new chunks as they are added.

Here is an example showing typical use of have and want messages between two peers:

Have bitfield

If you have lots of little, non-contiguous ranges of data it can take a lot of have messages to tell your peer exactly what you have. There is an alternate form of the have message for this purpose. It is efficient at representing both contiguous and non-contiguous ranges of data.

This form of the have message only has one field:

No. Name Type Description
3 Bitfield Length-prefixed A sequence of contiguous and non-contiguous chunk ranges.

For example, let’s look at the chunks this peer has. Normally this would take 11 messages to represent:

Instead, divide the chunks into ranges where each range is either contiguous (all chunks present or none present), or non-contiguous (some chunks present). The ranges must be multiples of 8 chunks long.

Ranges are encoded as:

Putting everything together, here are the bits used to encode the chunks this peer has:

And here is how the final have message would appear on the wire:

Requesting data

Once your peer has told you that they have a chunk you want, send a request message to ask them for it:

No. Name Type Description
1 Index Varint Number of the chunk to send back. This field must be present even when using the bytes field below.
2 Bytes Varint If this field is present, ignore the index field and send back the chunk containing this byte. Useful if you don’t know how big each chunk is but you want to seek to a specific byte.
3 Hash Varint 0 = Send back the data in this chunk as well as hashes needed to verify it, 1 = Don’t send back the data in this chunk, only send the hashes.
4 Nodes Varint Used to request additional hashes needed to verify the integrity of this chunk. 0 = Send back all hashes needed to verify this chunk, 1 = Just send the data, no hashes. For other values that can be used to request specific hashes from the hash tree, see the Wire Protocol specification.

If you no longer want a chunk you requested, send a cancel message:

No. Name Type Description
1 Index Varint Number of the chunk to cancel. This field must be present even when using the bytes field below.
2 Bytes Varint If this field is present, ignore the index field and cancel the request for the chunk containing this byte.
3 Hash Varint Set to the same value as the hash field of the request you want to cancel.

Cancel messages can be used if you preemptively requested a chunk from multiple peers at the same time. Upon receiving the chunk from the fastest peer, send cancel messages to the others.

When a peer has requested a chunk from you, send it to them with a data message:

No. Name Type Description
1 Index Varint Chunk number.
2 Value Length-prefixed Contents of the chunk. Do not set this field if the request had hash = 1.
3 Nodes Length-prefixed

This field is repeated for each hash that the requester needs to verify the chunk’s integrity.

No. Name Type Description
1 Index Varint Hash number.
2 Hash Length-prefixed 32-byte chunk hash or parent hash.
3 Size Varint Total length of data in chunks covered by this hash.
4 Signature Length-prefixed 64-byte ed25519 signature of the root hash corresponding to this chunk.

Files and folders

Dat uses two coupled feeds to represent files and folders. The metadata feed contains the names, sizes and other metadata for each file, and its typically quite small even when the Dat contains a lot of data. The content feed contains the actual file contents. The metadata feed points to where in the content feed each file is located, so you only need to fetch the contents of files you are interested in.

The first chunk of the metadata feed is always an index chunk. Check that the type field contains the word “hyperdrive”. If so, the content field is the public key of the content feed.

No. Name Type Description
1 Type Length-prefixed What sort of data is contained in this Dat. For Dats using the concept of files and folders this is “hyperdrive”.
2 Content Length-prefixed 32-byte public key of the content feed.

After the index all following chunks in the metadata feed are nodes, which store file metadata. Nodes have these fields:

No. Name Type Description
1 Name Length-prefixed Slash-separated path and filename. Always begins with a slash. For example: “/src/main.c”
2 Value Length-prefixed

If this field is present the file is being created or updated. These sub-fields give the details of the new or updated file:

No. Name Type Description
1 Mode Varint

Unix permissions. In practice one of these two common values depending on whether the file is executable or not:

Security-sensitive bits such as setuid and setgid might also be set. When extracting files from a Dat to the filesystem you might consider not honoring these bits.

2 UID Varint Unix user ID. Alternatively, set to 0 to not expose the user’s ID.
3 GID Varint Unix group ID. Alternatively, set to 0 to not expose the user’s group ID.
4 Size Varint Size of the file in bytes.
5 Blocks Varint Number of chunks the file occupies in the content feed.
6 Offset Varint Chunk number of the first chunk in the content feed.
7 Byte offset Varint Size in bytes of all chunks in the content feed before this file. 0 for the first file.
8 Mtime Varint Time the file was last modified. Number of milliseconds since 1 January 1970 00:00:00 UTC.
9 Ctime Varint Time the file was created. Number of milliseconds since 1 January 1970 00:00:00 UTC.

If the value field is absent (and therefore none of the sub-fields above are set) the file previously existing with this name is now deleted.

3 Paths Length-prefixed Index that helps to traverse folders more efficiently. See below.

To find the latest version of a file, start from the end of the metadata feed and work backwards until you find a node with that file’s name. Even though files can be modified and deleted, previous versions can still be retrieved by searching back in the metadata feed.

Here is an example showing how the offset and blocks fields refer to chunks in the content feed:

Paths index

Scanning through the list of all files added, modified or deleted would be slow for Dats that contain lots of files or a long history. To make this faster, every node in the metadata feed contains extra information in the paths field to help traverse folders.

To calculate a paths field, start by constructing a file hierarchy from the metadata feed:

For each file in the hierarchy, find the most recent entry for it in the metadata feed and remember the node number. For each folder, remember the highest node number among its children:

Locate the file that was just added. Select that file and all its parent folders up to the root folder. Also select files and folders within those folders, but not their descendants. Ignore everything else.

Next, follow these steps to process the node numbers into bytes:

So node 8 would appear in the metadata feed as:

The process for calculating the paths field after deleting a file is mostly the same as when adding a file:

Future of Dat

Dat was first released in 2013, which in terms of internet infrastructure is very recent. Parts of the protocol are still changing today to enable Dat to handle bigger datasets, more hostile network conditions and support new types of applications.

This guide has described the Dat protocol as of January 2019. Here’s a brief summary of upcoming proposals to modify the Dat protocol in the near future:


This is the end of How Dat Works. We’ve seen all the steps necessary to download and share files using Dat. If you’d like to write an implementation then check out the Dat Protocol Book which offers guidance about implementation details.

The focus of this guide has been on storing files, but this is just one use of Dat. The protocol is flexible enough to store arbitrary data that does not use the concept of files or folders. One example is Hyperdb which is a key/value database, but Dat can be extended to support completely different use cases too. Take a look at the formal protocol specifications for more information.