How to secure data on unreliable nodes?
May 27, 2022

How to secure data on unreliable nodes?

Is it possible to provide a secure and reliable service using insecure and unreliable nodes ?

The industry often computes the reliability of a system (e.g. its Mean Time Between Failures)  by aggregating those of its components - a system being weaker than its weakest component. Relying on intermittent or unreliable elements does not seem the best idea to build a reliable service. Critical systems that cannot afford failure, such as airplanes or spacecrafts, are nevertheless built. They rely on the engineering concept of redundancy, which is the duplication of critical components with the intention of increasing its reliability.

Fortunately, peer to peer systems are made of thousands or millions of similar nodes. Nodes may be unreliable but easily duplicated. But basic duplication comes at a cost : to guarantee timely access to a file with intermittent nodes (e.g. 90% availability), it may be required to duplicate this file on two, three or even four nodes for availabilities of 99%, 99.9% or 99.99%.

Fortunately, we have more optimal strategies.

Using Forward Error Correction : error correction codes were invented in the 50s to control and repair errors over a noisy communication channel. This is achieved by adding additional information, aka “redundancy” in the transmitted information. Reed Solomon coding is one of the famous examples of such code, used in CDs, DVDs, and hard disk drives. For disk drives, RAID-6 uses similar redundancy strategies.

How is that used in Hive’s peer to peer file systems?

Files are split into shards of data spread across the peer to peer network. Additional shards are created to account for nodes disappearing from Hive or content being destroyed by hardware failures. For example, let’s assume 100 encrypted shards are generated from your file and sent to 100 peers. These are generated in such a way that only 70 are needed to rebuild the original file. Missing shards are regenerated as soon as we discover nodes leaving. With only 30% overhead, the probability of not being able to access the content is then several orders of magnitude lower compared to the simple replication strategy.

Modeling node behaviors : nodes in the Hive peer to peer network all have the same role, but behave differently. The computer in your bedroom is turned off every night, but your NAS is on 24/7. Usage patterns and availability are different over the day and vary between nodes and across geographies. HiveNet learns the behavior of each node and provides optimal placement to make sure we will have sufficient available nodes when data is needed..

Data persistence: we explained how forward error correction can mitigate unavailable nodes. Nodes may also experience hardware failures and will remain unavailable forever. Nodes that don’t respond to proof of storage for some time, or for whom unavailability has been reported for a long time will be marked as a failed node. The peer-to-peer network will start reconstructing its data elsewhere.

Beyond protecting persistence and accessibility, we also need to protect data privacy. Data is encrypted before it leaves the user’s device. Nor Hive, nor any of the nodes storing this data will be able to decrypt this content. A more detailed article will explain our encryption strategy.