How a P2P network guarantees data availability
November 24, 2022

How a P2P network guarantees data availability

The industry often computes the reliability of a system (e.g. its Mean Time Between Failures)  by aggregating those of its components - a system being weaker than its weakest component. Relying on intermittent or unreliable elements does not seem to be the best idea to build a reliable service. Critical systems that cannot afford failure, such as airplanes or spacecrafts, are nevertheless built. They rely on the engineering concept of redundancy, which is the duplication of critical components with the intention of increasing its reliability.

Fortunately, peer to peer systems are made of thousands or millions of similar peers. Peers may be unreliable but easily duplicated. But basic duplication comes at a cost : to guarantee timely access to a file with intermittent peers (e.g. 90% availability), it may be required to duplicate this file on two, three or even four peers for an availability of 99%, 99.9% or 99.99%.

Fortunately, we have more optimal strategies :

1- Using Forward Error Correction : error correction codes were invented in the 50s to control and repair errors over a noisy communication channel. This is achieved by adding additional information, aka “redundancy” in the transmitted information. Reed Solomon coding is one of the famous examples of such code, used in CDs, DVDs, and hard disk drives. For disk drives, RAID-6 uses similar redundancy strategies.

How is that used in Hive’s P2P file storage system, HiveDrive?

Files are split into shards of data spread across the P2P network. Additional shards are created to account for peers disappearing from Hive or content being destroyed by hardware failures. For example, let’s assume 100 encrypted shards are generated from your file and sent to 100 peers. These are generated in such a way that only 70 are needed to rebuild the original file. Missing shards are regenerated as soon as we discover peers leaving. With only 30% overhead, the probability of not being able to access the content is then several orders of magnitude lower compared to the simple replication strategy.

2- Modeling node behaviors : peers in the Hive P2P network all have the same role, but behave differently. The computer in your bedroom is turned off every night, but your NAS is on 24/7. Usage patterns and availability are different over the day and vary between peers and across geographies. Hive learns the behavior of each peer and places each shard in an optimal location to ensure that the data can always be reconstructed when needed.

3- Data persistence : we explained how forward error correction can mitigate unavailable peers. Peers may also experience hardware failures and will remain unavailable forever. peers that haven’t connected for a long time, or are failing to prove that they have valid data will be marked as a failed peer. Hive’s P2P network will start reconstructing its data elsewhere.

As you can see, there are a variety of ways for a P2P network to guarantee data availability in a heterogeneous, rapidly evolving group of peers, by using the advantage of numbers to provide what is essentially statistical cloud storage.