Overview
Blockchains are often assumed to be fault tolerant due to their decentralized nature, yet real-world failures repeatedly challenge this assumption. Outages lasting days have been observed across major blockchain networks. Some could not even recover from transient isolated failures. Unlike traditional distributed systems, where well-established fault tolerance mechanisms exist, blockchains vary significantly in their consensus protocols, network structures, and failure handling approaches, making dependability evaluation particularly challenging. Despite the critical importance of fault tolerance, most blockchain benchmarking efforts focus primarily on performance metrics such as throughput and latency, often under ideal conditions that fail to account for real-world failure scenarios.
Evaluating fault tolerance requires a systematic approach to fault injection and measurement, enabling researchers and practitioners to analyze blockchain resilience, recoverability, and failure impact. This tutorial focuses on dependability assessment, covering practical methods for introducing faults, observing their effects, and quantifying blockchain sensitivity to failures, providing a different perspectives on blockchain performance evaluation compared to our tutorial on performance benchmarking with DIABLO.
In this half-day tutorial, we will evaluate the fault tolerance of one of the six blockchains, Algorand, Avalanche, Aptos, Ethereum, Redbelly and Solana with the recent STABL benchmark suite. We will look at the sensitivity score and extend the observer node with another fault type.
Outline
The tutorial will have the following structure:
- Introduction (20 min). What is blockchain, transaction throughput and latency, DIABLO and STABL benchmarking frameworks.
- Simple Demo (30 min). Using the virual machine image, members of the audience will run a local experiment on their own machines.
- Fault Tolerance and Dependability (20 min). Explanation of crash faults and network partitioning, and packet loss.
- Fault Tolerance Demo (30 min). We will introduce crash faults in the scenario and observe performance and sensitivity score under new conditions.
- Benchmarking Details (20 min). Explanation of metrics, workload types, and emulating various network conditions.
- Advanced Demo (30 min). We will explain the fault scenario implementation in STABL observer node and how to extend it with a new fault type.
- Discussion (30 min). Implications of design decisions, metrics and aspects to be evaluated.