Skip to main content

Benchmarking Blockchain Data Availability: Polkadot ELVES vs Modular DA Layers

Comparative Study of Data Availability Schemes in Various Blockchains


  • Team Name: Chainscore Labs (Chainscore Private Limited)
  • Payment Details:
    • DOT: 15p3jWZaP4dHTkDTuKM5VXQL5XGfH9U5r6Ntu3UAv2K7vPb8
    • Payment: 15p3jWZaP4dHTkDTuKM5VXQL5XGfH9U5r6Ntu3UAv2K7vPb8 (USDC)
  • Level: 2

Project Overview 📄

Overview

This application is in response to the Research Task.

A comprehensive comparison of Polkadot's ELVES protocol against emerging data availability solutions (Celestia, Espresso's Tiramisu, NEAR's sharded DA, Polygon Avail, etc.).

  • Project Description: We will conduct an in-depth research study on how different blockchain ecosystems ensure data availability, i.e. that all block data is published and retrievable by the network. The study will qualitatively and quantitatively compare Polkadot's integrated parachain availability scheme to other approaches like Celestia's modular DA layer, Espresso's Tiramisu layered DA, NEAR's sharded data availability, Polygon's Avail chain, and any similar solutions. We will benchmark these schemes on metrics such as bandwidth overhead, proof sizes, data availability latency, and validator resource costs, using both theoretical analysis and production network data. This includes analyzing real telemetry from Polkadot/Kusama, Celestia's mainnet, Espresso's testnet, NEAR's shard metrics, Avail's testnet/mainnet, etc. The outcome will be a detailed report and open-source tools that illuminate the trade-offs in design and performance of each scheme.
  • Relation to Polkadot: Polkadot is directly at the center of this research – it provides a built-in data availability mechanism for parachains (via erasure-coded pieces and availability voting by validators). Our project will evaluate Polkadot's approach in the broader context of new modular blockchain architectures that decouple data availability (e.g. Celestia, Avail) as well as other layer-1s exploring novel DA designs (NEAR, Espresso). By benchmarking Polkadot's ELVES protocol against these, we aim to highlight Polkadot's strengths and potential areas of improvement. The findings could inform Polkadot core development (e.g. optimizations to networking or erasure coding) and guide parachain teams or bridge builders considering external DA solutions. Additionally, since Avail is built with the Polkadot SDK and NPoS consensus, this research reinforces Polkadot's role in the broader data availability landscape.
  • Interest & Motivation: Data availability is a critical component of blockchain scalability and security. As Polkadot builders and Web3 researchers, we are keenly interested in understanding how Polkadot's approach compares to the latest innovations. With the rise of modular blockchains and dedicated DA layers, it's important to objectively assess whether Polkadot's tightly coupled approach (where validators guarantee parachain data availability) offers efficiency or faces limitations compared to specialized systems. Our team's background in cryptography and blockchain infrastructure (e.g. implementing VRFs, erasure coding libraries) naturally leads us to investigate this problem. We are passionate about strengthening the Polkadot ecosystem and believe that sharing knowledge on data availability will benefit protocol designers and developers across Web3.
  • Academic or Technical: This project is research-focused. We intend to produce a rigorous technical report (and accompanying dataset/code) that will be published openly. Our goal is to meet academic standards of analysis (e.g. methodology, reproducibility) and we may submit the results to a peer-reviewed venue or publish as a Web3 Foundation research report. However, we will ensure the output is also accessible as a technical article for the broader community (e.g. on the Polkadot forum or our blog). In summary, the work leans toward an academic research style (grounded in empirical data and related work), with publication likely as an online technical paper or conference workshop submission.

Project Details

  • Problem Statement: As blockchain ecosystems scale, ensuring data availability (DA) is a fundamental challenge: all participants must be confident that block data is published and available, otherwise validators/full nodes cannot verify transactions and security is compromised. Different projects have proposed different DA schemes:

    • Polkadot's approach: When a parachain produces a block candidate, Polkadot's design distributes the block's data (PoV) into chunks via erasure coding and assigns these to validators. A known subset of validators attests that they have stored the chunk (availability bitfield), and if >2/3 of them signal success, the relay chain proceeds with that parachain block. This is followed by approval checking by a random subset who retrieve the data and validate the state transition. Polkadot's protocol (ELVES) couples data availability with its consensus process tightly, aiming to optimize bandwidth by gossiping chunks primarily among the assigned validators. However, this raises questions: Is this approach more efficient or secure than emerging alternatives? Does reliance on full validator nodes limit scalability or light-client trust assumptions?
    • Emerging DA layers (Celestia, Avail, etc.): New "modular" blockchains separate data availability into a dedicated layer. Celestia (formerly LazyLedger) is a standalone DA network where blocks are published to ~100 validators (Tendermint consensus) and light clients use Data Availability Sampling (DAS) to verify blocks without downloading them fully. Celestia uses a 2D Reed-Solomon erasure coding and Namespaced Merkle Trees (NMT) to allow light nodes to sample random shares of each block; if enough random samples succeed, the block is deemed available with high probability. It also employs fraud proofs to handle any malformed encodings. Avail by Polygon is another DA chain built with Substrate: it uses Polkadot's NPoS (BABE + GRANDPA) for consensus (supporting up to 1000 validators). Avail similarly incorporates DAS (light clients sampling) but with an added twist: it uses KZG polynomial commitments on each block's data for verifiable encoding. Blocks are extended to 2n pieces so that any n pieces can reconstruct the data, and KZG commitments allow light nodes to check sampled chunks against the commitment in the header, providing cryptographic integrity proofs. These modular approaches claim to enable trust-minimized light clients (only a minority of honest nodes needed for security) and potentially offer more flexible scaling of block sizes. However, they introduce overhead in proof generation (e.g. computing KZG proofs) and rely on continuous sampling.
    • Layer-1 specific solutions (Espresso, NEAR, Ethereum): Espresso's "Tiramisu" DA layer combines ideas from both worlds. It implements a three-tier system: first an optimistic fast path where a small committee (or even a CDN-like broadcaster) serves the data quickly, and if they misbehave, it falls back to a full P2P gossip and retrieval similar to a base layer. Tiramisu uses Verifiable Information Dispersal (VID) (like polynomial commitments or other cryptographic proofs of availability) to ensure data can be retrieved and verified if any dispute arises. The design is analogous to Ethereum's planned Danksharding (which will also use polynomial commitments and DAS) but Espresso achieves it with restaked Ethereum security on a separate network. NEAR Protocol takes yet another approach: as a sharded L1, NEAR stores data across multiple shards. NEAR recently introduced a Blob Store smart contract and light client proofs to offer data availability for external rollups. Rollups can post blobs of data on NEAR; validators ensure the data is included on-chain, and the blobs are pruned later to save space (with archival nodes retaining full history). A NEAR light client can verify via Merkle proofs that the blob was published on-chain. This provides a cheaper alternative to Ethereum for rollup data, though it doesn't use erasure coding or sampling – essentially, NEAR provides on-chain data availability with off-chain retrieval after pruning. Meanwhile, Ethereum itself is adding proto-danksharding (EIP-4844 blobs) which provide ~2 MB extra data space per block with KZG commitments, but initially without sampling (full nodes must download blobs). Full Danksharding in the future will add DAS on Ethereum, aligning it with Celestia/Avail's model.
    • Why It's Important: Each scheme makes different trade-offs in complexity, security assumptions, and network overhead. Polkadot's integrated approach may optimize latency (since data distribution is part of the block production process) and avoid the need for additional light-client sampling protocols. However, it requires every validator to participate in data gossip for all parachains, which could be bandwidth-intensive at scale (imagine 100 parachains all hitting PoV size limits). Celestia/Avail allow light clients and external chains to trust data availability with minimal assumptions (only need ≥1 honest responder for sampling or ≥1 honest archival node), but they introduce new network actors and potential liveness assumptions (e.g. continuous sampling). There is currently no comprehensive head-to-head evaluation of these approaches using real network data – this project will fill that gap, helping to identify which approach works best under various conditions and workloads.
  • Research Questions: We aim to answer several key questions:

    1. Efficiency: How do the bandwidth and storage requirements compare? E.g. how much data does a Polkadot validator transmit and store for parachain availability versus a Celestia validator or Avail validator for a similar throughput? We will measure metrics like average and peak PoV (parachain block) size in Polkadot (currently up to 5 MB limit, though rarely reached) and average Celestia block size (Celestia's blocks can scale with demand, targeted by DAS probabilities). We will also evaluate proof sizes: Polkadot uses Merkle proofs for chunk inclusion (tens of KB per piece), Celestia/Avail use constant-size KZG proofs (~48 bytes) plus Merkle proofs within NMTs.
    2. Performance and Latency: What is the latency from block production to data availability completion in each system? In Polkadot, parachain candidates must gather 2/3 availability votes (usually within one relay-chain slot, ~6 s) and then pass approval checks (adding a few seconds) before finality. Celestia achieves ~15 s block time and ~12 s finality with Tendermint; data is available to full nodes immediately, and light nodes achieve confidence after a sampling period (which we can simulate). Espresso's HotShot/Tiramisu targets sub-5 s finality optimistically. We will measure how quickly data can be retrieved by a new node in each network (e.g. simulate a light client joining and trying to obtain missing pieces). We will also examine the failure cases: e.g. if some validators/committee members do not serve data, how does each protocol recover (Polkadot has back-and-forth requests for chunks; Celestia has light nodes increase sample sizes or rely on full nodes; Tiramisu falls back to P2P).
    3. Security and Assumptions: What are the trust assumptions in each scheme, and how do they affect real-world security? Polkadot's safety requires ≥2/3 of validators honest for availability and validity (otherwise a withheld chunk or invalid state transition could slip through, though there are slashing penalties for both). Celestia and Avail require an honest minority for data availability (as few as one honest node can ensure data gets out, via fraud proofs or sampling), which is a stronger guarantee for light clients. However, Polkadot's integrated approach may offer stronger synchrony/liveness – if Celestia's sampling clients go offline or committees in Espresso misbehave, data retrieval could be delayed. We will analyze historical data for any incidents (e.g. if any Polkadot parachain block ever failed availability or any Celestia block had suspected DA problems) to compare robustness.
    4. Validator Costs: We will quantify the computational and financial cost for validators in each network to support data availability. For Polkadot, each relay-chain validator must process erasure coding for incoming parachain blocks and store many chunks temporarily – how does this scale as parachains increase? We can use Kusama as a high-load scenario (with ~50 parachains active) to measure bandwidth usage per validator from telemetry logs. For Celestia/Avail, validators must handle larger blocks (potentially a couple MB every 15–20 s) and perform encoding; we'll measure their node bandwidth and storage (Celestia validators store at least 30 days of data by default, Avail currently stores all history but may prune in future). We will also consider the entry cost: Polkadot's design pushes complexity to the relay-chain validators (which are professionally operated, limited in count ~300), whereas Celestia/Avail require many full nodes or a committee – we'll compare the decentralization (e.g. Polkadot ~300 validators vs Celestia ~150 active validators vs Avail targeting 1000 validators).

    Through these questions, we hope to validate or challenge the hypothesis that Polkadot's tightly-coupled DA is more bandwidth-efficient for its use case, while modular DA layers provide greater light client security and flexibility. We will not only answer which scheme performs better on each metric, but also why, providing insights into the design trade-offs.

  • Methodology: Our approach consists of two main components – (1) literature & protocol analysis, and (2) empirical measurements & benchmarking.

    1. Literature and Protocol Review: We will begin by systematically studying the specifications and existing research for each DA scheme. This includes Polkadot's Host protocol spec (Chapter 8: Availability & Validity) and any design notes, the Celestia yellowpaper and relevant forum discussions on NMTs/DAS, Espresso's HotShot + Tiramisu technical paper, NEAR's sharding design (Nightshade paper and recent "chunk-only producers" updates), and Avail's documentation/Medium articles. We will also review related academic papers on verifiable dispersal and polynomial commitments (e.g. Kate et al.'s KZG commitments, data availability proofs in danksharding) to ensure our understanding is solid. From this, we will distill a summary of how each scheme works, and identify key points of comparison (e.g. Does it use erasure coding? If so, what rate? Does it use cryptographic proofs for data (KZG/NMT)? Does it support light client sampling? How many nodes must be honest? etc.). We will produce a reference matrix of features for all systems.

    2. Benchmark Design: Next, we will define the concrete metrics and experiments for quantitative comparison. This will involve writing a measurement plan for each target chain:

      • Polkadot/Kusama: Using on-chain telemetry and logs. Polkadot provides a telemetry service (telemetry.polkadot.io) with metrics like PoV size, peer count, network I/O. We will also instrument a Polkadot validator (in a controlled environment or using an archive node's logs) to capture data about chunk distribution: e.g. how many bytes of PoV data it downloads and uploads per parachain block under various loads. If possible, we might replay historical blocks (using an instrumented client or simulation) to measure worst-case bandwidth when PoV = 5 MB and chunks are max size. Kusama will serve as a "stress test" since it often pushes limits (e.g. Statemine and others nearing size limits). We will gather data over a period (at least several days worth of blocks) to account for variability.
      • Celestia: We will run a full Celestia light node and full node on the Celestia mainnet (and/or testnet if needed for experimentation) to measure block sizes, propagation time, and sampling performance. The plan is to subscribe to new blocks via RPC or a Celestia node API, log block data size and time to receive it. For DAS, we will use Celestia's light client libraries to perform random sampling on new blocks and measure how many samples (and how long) are needed to reach high confidence (the protocol is probabilistic – by design ~30 samples gives 95%+ confidence for 2D coded blocks of certain size). We will also monitor the network's behavior (e.g. does block propagation time increase with size, any failures). If available, Celestia's block explorer or research portal may have historical stats which we will incorporate.
      • Espresso (Tiramisu): Espresso is currently in testnet ("Cappuccino" testnet). We will join this testnet as both a sequencer node and a DA node to record data. Specifically, we'll measure the size of DA bundles (Espresso's blocks or rollup batches), how the hybrid networking (CDN + P2P) impacts data distribution, and the latency to retrieve blocks in normal vs fallback mode. We will coordinate with Espresso's documentation to trigger or simulate a fallback (e.g. intentionally cause the small DA committee to fail) to observe the worst-case performance. Any devnet metrics provided by Espresso (as Figment reported running DA nodes) will be consulted.
      • NEAR: For NEAR, we will set up a node on a testnet that produces blobs via the Blob Store and measure costs and throughput. NEAR's sharded nature means each validator only stores data for the shard it validates; however, for an external rollup scenario, the blob might be on a special shard. We will deploy sample blobs (of various sizes up to NEAR's limits) to the contract and measure the on-chain cost (which we expect to be ~8000x cheaper than Ethereum L1 per NEAR's claims) and how quickly those blobs become pruned. We'll also test retrieval: run a light client or use NEAR RPC to fetch pruned blobs via an archival node, measuring the overhead of proof generation. NEAR's metrics on chunk production and network will also be examined if available (NEAR telemetry or explorer for shard block sizes).
      • Avail: We will use the Avail testnet (and mainnet if launched during our project timeline) to gather data. As Avail is Substrate-based, we can instrument it similarly to Polkadot. We plan to run an Avail full node, and if possible, simulate some rollup clients posting data to inflate block sizes. We'll capture block size, block time (expected ~20 s), and test Avail's light client (the team provides an SDK for DAS queries). We'll measure the size of KZG commitments and any proof data included in blocks. If Avail's network parameters allow configurable chunk sizes or number of samples, we'll experiment to find the point of failure or performance drop (e.g. how does a 2 MB Avail block compare to a 2 MB Celestia block in terms of light client sample times).
      • Other/Comparative: If relevant, we may also measure EigenLayer's EigenDA on testnet, which uses restaked Ethereum validators to store data off-chain with KZG proofs. It's not a standalone chain but rather smart contracts and a committee, so we'd likely simulate the process or rely on published benchmarks. This can add context especially for the "committee vs chain" approach.

      For all these, we will ensure standardized scenarios as much as possible. For example, we might compare how each system handles ~2 MB of new data per 15 seconds (since Ethereum with blobs can do ~2 MB/12s, Celestia targets large blocks, Polkadot could hypothetically hit 5 MB/6s if fully saturated across parachains). We will use cloud instances and Docker containers to deploy our measurement nodes deterministically, and synchronize the data collection period across networks (e.g. one week of observation in parallel, if feasible) to account for similar real-world network conditions.

    3. Data Collection & Analysis: Once the tools are in place, we will collect data over time and aggregate the results. We will produce time-series and statistical summaries for each metric (bandwidth usage, latency, etc.). For example, we'll graph the distribution of Polkadot parachain PoV sizes and Celestia block sizes, or the cumulative bandwidth per validator vs per Celestia node. We'll also compute per transaction or per byte costs (e.g. how many bytes of transmission are needed to make 1 byte of transaction data available in each system). The analysis will involve both qualitative comparison tables and quantitative charts. We will use Python (pandas, Jupyter) or R for data analysis, and possibly Jupyter notebooks to allow easy replication of the analysis. Statistical techniques (mean, variance, outlier detection) will be applied to ensure findings are robust. We will specifically look for any anomalies (e.g. if Celestia had any periods of unavailability or Polkadot's networking spiked) and cross-verify those with community reports or forum posts.

    4. Validation & Reproducibility: Reproducibility is a core part of our methodology. We will publish all data sets and the code for data gathering/analysis under open licenses (see deliverables). The analysis procedures will be documented so that the W3F grants team (or any interested party) can rerun them. For example, we will provide scripts to replay a subset of telemetry or log data to verify our bandwidth calculations. We will also double-check results by comparing with known benchmarks (if the projects have published any). For instance, if Celestia's team claims light clients need ~samples X for block of size Y, we will ensure our measured data is in line or explain any difference. The final report will explicitly state how someone can verify each result – whether by running our Dockerized environment or by independently collecting fresh data via the provided instructions.

  • Expected Results: We expect to produce:

    • A detailed comparative report that highlights how Polkadot's availability scheme stands relative to others. This will likely show that Polkadot's design is bandwidth-optimized for its specific use case (sharded parachains), as implied by Polkadot developers – for instance, validators only need to deal with chunks ~20 KB each for a 5 MB parachain block, whereas a Celestia node must handle the full block data (though fewer validators overall). On the other hand, we anticipate showing that light clients are much more empowered in Celestia/Avail: a Polkadot light client (other than full nodes, only grandpa light clients exist, which trust validators) cannot independently verify parachain block availability without trusting the validator set, whereas in Celestia or Avail, a light client can detect missing data with high probability. Quantitatively, we expect to find trade-offs such as: Polkadot's per-byte availability cost (in bandwidth) is lower at current scale, but Celestia's cost grows sub-linearly with more light clients (since sampling burden is distributed). We will also likely document that Avail, by using a larger validator set and KZG, might achieve a middle ground: more validators (so more decentralization) than Celestia, with cryptographic guarantees, but at the cost of longer block time (~20 s) and higher complexity.

    • The benchmark figures (bandwidth, latency, etc.) for each solution under comparable conditions. For example, we might conclude:

      • Polkadot (45 parachains active, ~300 validators): each validator handles ~X MB/hr of parachain data on average, with peaks of Y MB/hr; availability confirmation adds ~1–2 s to block finalization.
      • Celestia (100 validators): full nodes handle ~A MB/hr (depending on usage); light nodes need ~B samples per block of size M MB to reach 99% certainty, taking ~C ms; block propagation time grows with size but finality remains 12 s.
      • Avail (50 validators on testnet or simulated 1000 validators): block size N MB leads to ~N*50% extra coding overhead; light clients sampling shows success similar to Celestia's (since both use DAS) but KZG proof verification time per sample (tiny, a few ms) is added.
      • Espresso (HotShot+Tiramisu testnet): baseline latencies of D seconds when committee is honest vs D' seconds if fallback needed; bandwidth mostly borne by a few nodes in optimistic path.
      • NEAR: posting a 1 MB blob costs $Z (in gas), data available to NEAR validators within 1 block (~1 s) and pruned after T blocks; proving inclusion to Ethereum or another chain can be done via a light client proof of size P bytes.

      (These are illustrative; actual numbers will come from our measurements.) We will tabulate such results and also present ratios (e.g. Polkadot vs Celestia: which uses less bandwidth for the same data amount?).

    • We also anticipate uncovering any current bottlenecks or issues. For example, if our telemetry shows Polkadot's networking could become a bottleneck with e.g. 100 parachains all near PoV limit (we may extrapolate using the Kusama stress-tests), that's valuable insight. Or if Celestia's DAS exhibits any scaling issues (maybe for very large blocks the sampling needs to ramp up), we'll highlight that. The expectation is that the research will not only compare but also validate the robustness claims of each approach (for instance, demonstrating that Avail's KZG+sampling does indeed catch missing data, by perhaps staging a small experiment if allowed).

    • Ultimately, the expected result is a set of actionable findings: for Polkadot devs (e.g. Is pursuing a separate DA chain beneficial or not?), for parachain teams (e.g. Would using an external DA layer via XCMP make sense?), and for the broader community (understanding the state-of-the-art in DA). We expect Polkadot's approach to fare well in many respects (due to its mature implementation and focused scope), but the study will objectively show where newer designs like Celestia/Avail shine.

  • Reproducibility: To ensure the grants team can verify our analysis, we will provide:

    • All raw data collected (block traces, telemetry logs, etc.), as well as processed data sets, under an open license (likely CC BY 4.0 for data). This way, anyone can double-check the numbers or perform their own analysis. If data volume is large, we will offer a subset or a script to fetch it.
    • A step-by-step guide (and scripts) to reproduce each experiment. For example, how to set up a Polkadot node with logging enabled and run our parser to calculate bandwidth, or how to run our Celestia light client sampler code. We will use Docker images to encapsulate the environment (e.g. a Docker that has our analysis Jupyter notebook with data pre-loaded, so the evaluator can just run it to regenerate the graphs). We will also consider using Terraform for automating any cloud infrastructure needed (e.g. deploying nodes in a sandbox network to simulate heavy load).
    • The final report will have an appendix or section on "Verification" which maps each claim to either a reference (cite to documentation) or a reproducible experiment in our code. For example, if we say "Polkadot parachain block limit is 5 MB", we will cite the runtime spec; if we say "in our tests Celestia light nodes detected missing data in under 5 seconds", we will point to the specific experiment configuration. The goal is that the Web3 Foundation or any external reviewer can independently confirm all results either by logic or by rerunning provided scripts.
  • Related Work: There have been a few comparative discussions in the community but no formal study combining theory and data. For instance, a recent Medium article by 0xemre compared Celestia, Avail, and EigenDA qualitatively, and another by Lachlan Todd outlined high-level differences (e.g. fraud proofs vs validity proofs, validator counts). The Ethereum research community has produced papers on data availability sampling and ProtoDanksharding (e.g. "Data Availability Sampling in Eth2" and Dankrad Feist's posts), and Espresso published a preprint on HotShot/Tiramisu. Polkadot's approach is documented in its spec and was also the subject of a past W3F research RFP on network topology for availability. We will build on these sources (citing and synthesizing them) but go further by bringing real network measurements. To our knowledge, no one has yet measured Polkadot's availability overhead in practice or compared it directly to Celestia's live network – our work will be the first to do so. We will maintain a bibliography of all relevant resources (whitepapers, forum posts, code specs) and include it with our deliverables for completeness.

  • Publication Venue & Timeline: We intend to submit the final report to the Web3 Foundation Research repository and also consider academic workshops or conferences (e.g. ACM Blockchain, IEEE ICBC, or a workshop at CCS/Financial Crypto) if the quality and findings merit it. The timeline for a conference submission would likely be post-grant (as CFP dates vary), but within a few months after completing the grant we aim to refine and submit a paper. Concurrently, we will publish a more accessible summary (e.g. on Medium or the Polkadot Forum) soon after the research is done, so the community can immediately benefit. Tentatively, if the project finishes by ~Month 4, we would publish the blog summary by Month 5 and aim for an academic submission by Month 6–7 (allowing time for peer review preparation).

  • What the Project is NOT: This project will not deliver any new blockchain protocol or software beyond the research artifacts (tools for measurement and analysis). We are not implementing a new data availability scheme or modifying Polkadot's code – rather, we are observing and analyzing existing systems. There will be no token or economic component; we are not building a product or service for end-users. This is strictly a research and benchmarking effort. It will also not provide any real-time monitoring service (though we will make data available for anyone to use, we won't be maintaining a long-running service beyond the project's scope). In short, we manage expectations that this grant results in knowledge and open-source research outputs, not deployable network infrastructure or new protocol features.

Ecosystem Fit

  • Where does the project fit in the ecosystem? This research project is best described as ecosystem infrastructure knowledge. It doesn't fit into a typical dApp or pallet category, but it serves the Polkadot/Substrate ecosystem by evaluating a core aspect of scalability and security. Polkadot aims to be scalable via parachains – data availability is literally one pillar of that scalability. Our project will help locate Polkadot's approach within the broader multi-chain landscape (especially with the rise of modular blockchains). In practical terms, this study could inform Polkadot's future direction – for example, discussions on whether to introduce an optional external DA layer parachain, or improvements to networking protocols. It also strengthens Polkadot's position by rigorously showcasing its capabilities relative to other solutions.

  • Target audience: The immediate audience is protocol developers and researchers in the Polkadot ecosystem (Parity, W3F researchers) and beyond. Polkadot developers will get a detailed assessment of the relay chain's availability subsystem. Parachain developers and infrastructure teams (e.g. those building scaling solutions, bridges, or exploring off-chain computation) are another audience – they can use our findings to decide if they should rely solely on Polkadot's availability or consider hybrids (like posting data to Avail or Celestia for certain applications). The broader Web3 research community (incl. Ethereum researchers interested in sharding and DA) will also find value in an apples-to-apples comparison. We will present the results in a way that is accessible: from high-level summaries for less technical stakeholders (like ecosystem advisors, investors interested in scaling tech) to deep dives for engineers.

  • Needs addressed: Our project meets the need for clarity and data-driven insight in an area that is often discussed but not empirically compared. As the ecosystem evolves, many teams are deciding on their data availability strategy (for example, a new rollup project must decide to post data on Ethereum vs use Celestia vs use an EigenLayer DAC). Polkadot parachains, though they have built-in DA via the relay chain, might wonder how it stacks up – we answer that. Additionally, W3F has an interest in supporting research that keeps Polkadot at the cutting edge. By performing this study, we are effectively auditing and stress-testing a component of Polkadot's design in the context of new innovations, which is valuable for its long-term competitiveness.

  • Similar projects: To our knowledge, there is no direct analog within the Polkadot ecosystem – no parachain or existing grant is doing cross-chain DA comparisons. This is a relatively novel research initiative. In related ecosystems, Celestia's team itself produces research on DA (but obviously focusing on their protocol), and Ethereum researchers are working on proto-danksharding benchmarks (but those typically don't involve Polkadot). Some independent analysts (like the Medium articles cited) have compared Celestia and Avail qualitatively, but those are neither comprehensive nor backed by data. Our project is different in that it is holistic and vendor-neutral – we are not advocating one solution but evaluating all. If anything, one could loosely compare our effort to a "benchmark report" similar to how some groups benchmark layer-1 throughput or bridge performance. But in the context of data availability, we appear to be pioneers, especially in drawing Polkadot into the comparison.

    In summary, our project fills a gap by providing the Polkadot community (and the wider multi-chain community) with reliable, research-grade information on how Polkadot's data availability mechanism compares to others. This helps ensure Polkadot's design choices are well-understood and can be communicated factually (e.g. in response to claims like "Celestia solves DA better", the community will have concrete evidence of the differences). It aligns with Polkadot's ethos of scalability and interoperability by examining potential points of interoperability (Polkadot <> Celestia bridges or use of Avail for parachains, etc.) and by pushing the knowledge frontier on a core topic.

Team 👥

Team members

Contact

  • Registered Address: S.No. 320/1, D-16 Vignahar Sankul, Narayangoan, Pune, Maharashtra 410504, India
  • Registered Legal Entity: Chainscore Private Limited

Team's experience

Chainscore Labs is a Web3 R&D company with deep expertise in blockchain protocols, cryptography, and Polkadot ecosystem development. Our team members have strong academic and practical backgrounds, with various contributions to the Polkadot ecosystem and are Polkadot Blockchain Academy (PBA) alumni. We have a proven track record of delivering high-quality Web3 R&D initiatives.

Chainscore Labs has prior experience with the Web3 Foundation, having successfully delivered:

  • DotRing: A Ring VRF implementation based on W3F research, developed in Python. This project demonstrates our ability to fulfill W3F grant expectations, including writing comprehensive tests, benchmarks, thorough documentation, and delivering to specification.
  • Tessera: A clean-room JAM implementation in Python.

Our team's experience spans protocol research, cryptographic implementations, and practical blockchain development, making us well-positioned to conduct this comparative data availability study with the rigor and technical depth required.

Development Roadmap 🔩

Total Estimated Duration: 4 months

Full-Time Equivalent (FTE): 2 FTE

Total Costs: $30,000 USD

DOT %: 50%

Milestone 1 – Literature Review & Methodology Design

  • Estimated duration: 1 month (Month 1)
  • FTE: 3
  • Costs: $6,000 USD

The default deliverables 0a-0e below are mandatory for all milestones.

NumberDeliverableSpecification
0a.Project Management & LicensesThe project will be managed in a public repository. We will provide access to our work-in-progress from the start. All code will be released under an open-source license (Apache 2.0 for software), and documentation, text, and data will be released under CC BY 4.0. We will include a LICENSE file for each.
0b.DocumentationWe will deliver documentation for this milestone, including a Literature Review document and a Methodology & Experiment Plan. These will describe what has been done so far and guide how to reproduce/verify it. Also, a brief tutorial will explain how to access our repository, run any scripts prepared in this stage, etc. (For example, instructions on how to compile our list of references or how to run a sample telemetry query.)
0c.MethodologyA detailed explanation of the methodology designed in this milestone. This will cover how we plan to achieve the results in subsequent milestones and how those results can be reproduced or verified. In Milestone 1 specifically, the methodology deliverable will be the experimental design and metrics definition ("benchmarking playbook"). This ensures transparency of our approach from the beginning.
0d.InfrastructureWe will list all infrastructure and software requirements for our research. At this stage, this includes the tools and environments needed for data collection: e.g., versions of Polkadot, Celestia node software, required libraries (Polkadot JS API, near-cli, etc.), any cloud resources considerations. We will also set up the initial Docker environment structure. By the end of M1, we will provide a Dockerfile (or configuration) for a base environment that can run our planned tools (this might be empty of functionality until M2, but will ensure reproducibility of the environment). If applicable, a basic Terraform script outline will be provided (e.g. setting up a VM that could run a Polkadot node).
0e.Article / Report (partial)We will produce the first part of the research article. For Milestone 1, this will include the introduction, background, and methodology sections of the final report. It will explain in accessible terms what we aim to do and why (targeted at the technical audience defined above). We will also include a summary of the literature review – effectively making this a standalone "related work" survey that could even be publishable as a blog post on its own. The article will be written in clear English, and will reflect our target audience (blockchain researchers and Polkadot community). For a level 2 grant, we will include the required acknowledgment in the final version of the article.
1.Comprehensive Literature ReviewDeliverable: A latex document listing and summarizing all relevant research papers, documentation, and specifications for each data availability scheme. This will include at least: Polkadot's relevant specs and research docs, Celestia whitepaper/docs, Avail technical papers or Medium articles, NEAR sharding and blob proposals, Espresso (HotShot/Tiramisu) paper, Ethereum's danksharding EIPs, and related academic work on data availability. Verification: The document will be included in the submission repo. It will contain web links or references to each source and a brief explanation of how each relates to our project. We expect 15+ references. For instance, we will cite Polkadot spec sections (with URLs) and summarize them, cite Celestia's DAS spec, etc., demonstrating we've captured the state of knowledge.
2.Benchmark Metrics & Criteria DefinitionDeliverable: A methodology draft detailing all metrics we will measure, how we will measure them, and what tools we will build or use. It will map each research question to specific data points. For example: Metric: Bandwidth per validator (Polkadot) – Method: instrument a node with logging, measure bytes in/out; Metric: Light client sampling time (Celestia) – Method: run light node with X queries, record response times, etc. We will also define success criteria or expectations for each metric (e.g., which differences would be significant). Verification: This will be part of a latex document in the repo (possibly combined with the literature review as a full methodology doc). The grants evaluators can read this and see clearly each planned experiment. It should be sufficiently detailed that, in principle, one could execute the plan from it even before we deliver code.
3.Experiment Environment Setup PlanDeliverable: A technical plan for setting up the needed environment, including identification of any dependencies or blockers. We will specify for each target chain what environment we need (node, RPC, etc.). We will also outline any custom development needed (e.g., a Substrate pallet to simulate large PoV). If we need to coordinate with any external testnets, it will be noted. Verification: This will be a section in the methodology document or a standalone "engineering plan" text. It will include concrete steps, for example: "To measure Polkadot's bandwidth, we will modify a Kusama node to log networking stats. Step 1: clone polkadot vX.Y, Step 2: apply patch (which we will write and link here) to enable logging of metric Z, Step 3: run node on Kusama for N hours." We will also update the repo with any preliminary scripts or patches (even if not fully implemented) demonstrating progress towards these setups. The presence of these files (like a patch file or a placeholder Python script to connect to telemetry) will serve as verification.

Notes: By the end of Milestone 1, we expect to have no code results yet, but a thoroughly fleshed-out plan and background. We will likely also deliver an initial bibliography (with citation links as in this proposal) and possibly publish the literature review on our blog for community feedback (optional). This milestone sets the stage for implementation. The Web3 Foundation can verify that we have a clear and comprehensive plan before we proceed to heavy data collection.

Milestone 2 – Tooling Development & Initial Data Collection

  • Estimated duration: 1 month (Month 2)
  • FTE: 3
  • Costs: $8,000 USD
NumberDeliverableSpecification
0a.LicensesWe will continue to abide by open-source licensing. All newly written source code in this milestone (data collection scripts, etc.) will include license headers (Apache 2.0). Any data collected will be documented and released under CC BY 4.0.
0b.DocumentationWe will update project documentation to cover this milestone's outputs. This includes a Technical README in the repository explaining how to use the developed tools (e.g. how to run the data collectors for each chain). We will also document any deviations from the original plan and justify them. A tutorial will be provided, for example: "How to run a Celestia light node and use our script to sample a block." Screenshots or terminal logs may be included for clarity.
0c.MethodologyWe will provide a detailed report on how we implemented the data collection and any intermediate results verification. Essentially, this is the execution of the methodology from M1. We will explain how we ensured the tools work correctly (e.g., testing that a sample of known size is measured accurately). This section also covers how others can reproduce the data collection: for each network, steps to launch our tooling and gather similar data.
0d.InfrastructureAll infrastructure used will be listed and any setup scripts provided. By this milestone, we expect to have Dockerfiles for each major component (e.g., a Docker to run a Polkadot node with our patches, a Docker for Celestia light node + our script, etc.). We will also provide Terraform scripts or shell scripts to launch cloud instances if we used them (for example, a script to start an AWS instance and deploy a Celestia full node for a day). The idea is to allow verification: the evaluator could, with provided instructions and scripts, replicate our environment.
0e.Article (update)We will produce an interim research article update covering what we achieved in Milestone 2. This will likely correspond to a "Methodology Implementation" or "Data Collection" section in the final paper. It will describe the tools built and perhaps highlight any interesting preliminary observations (without full analysis yet). The writing will ensure anyone reading understands how the data was gathered, adding credibility to the upcoming analysis.
1.Polkadot/Kusama Data Collection Tool & DataDeliverable: Code and scripts for collecting Polkadot (and Kusama) availability data, plus an initial dataset. Specifically:
- A modified Polkadot node or an external monitoring script that captures parachain block sizes and network stats (we will choose the method with least friction; likely using the Telemetry WebSocket API or node RPC).
- Scripts to parse and aggregate the logged data.
- A sample of data collected (at least several hours of logs from Kusama or Polkadot).

Verification: We will include the source code for any Polkadot node modifications (if used) or our external script in the repo (polkadot_monitor/ directory). We will also provide the collected raw data in CSV/JSON format in the repo or via a link (if too large). For example, a CSV with columns: timestamp, parachain_id, block_size, chunks_sent, etc. We will provide evidence of functionality: e.g., a plot or console output showing the tool capturing data (maybe included in the documentation or as an example in README). The evaluator can run our script against a live node or telemetry endpoint to verify it indeed collects data.
2.Celestia/Avail Data Collection Tool & DataDeliverable: Tools and initial data for Celestia and Avail:
- A script or program to run a Celestia light node (or interact with one) and perform DAS sampling on recent blocks, recording results (block height, size, sample success, time taken).
- A script to gather Celestia block metadata (e.g., via a full node RPC, capturing block sizes, number of shares).
- For Avail: since it's substrate-based, if available, a script to connect to Avail node RPC to get block sizes and any DA metrics. If Avail light client exists, a script to use it for sampling.
- Collected sample data: e.g., a log of Celestia blocks over a day with sizes and sampling outcomes; Avail testnet blocks with sizes.

Verification: We will publish the code (likely in a celestia_tools/ folder, maybe Go or Python using Celestia's light client API). We will show sample output – e.g., "Block 10000: 1.2 MB, 50 samples ok in 2.1 s". The initial dataset will be included (or a portion of it if huge) for verification. Evaluators could run our light client script (with a Celestia node endpoint we provide or their own) to reproduce the results on current blocks. For Avail, if running, similar verification by connecting to the testnet and fetching data.
3.Espresso & NEAR Data Scripts & DataDeliverable: Initial tooling and data for the remaining targets:
- For Espresso (Tiramisu): a script or notes on how we interacted with the Cappuccino testnet. Possibly a small program that subscribes to Espresso block events and logs DA-related info (since Espresso's code is not widely documented, this might be a combination of using their provided node software with flags, plus parsing logs or metrics). We will produce an initial dataset, e.g., "X blocks received via fast path, Y blocks via fallback, data size each, latency, etc.".
- For NEAR: a script using NEAR's JSON-RPC to submit and retrieve blobs. Also, if NEAR has telemetry for chunk sizes, a script to fetch those. Initial data could be: the time and cost it took to submit blobs of various sizes, and confirmation that we could retrieve them after pruning (if pruning can be simulated on a local node).

Verification: The repository will include these scripts (perhaps in near_da/ and espresso_da/ directories). We will provide instructions to use them. For example, to verify NEAR blob handling, we might include a unit test: the script uploads a test blob and then retrieves it, checking integrity. The output log or result (like "Blob stored and proof verified") will be shown. For Espresso, given it's a bit more closed, we might provide a recorded log from our node and a parser that extracts the relevant info (like block sizes and times). The evaluator can inspect this parser code and the log snippet to see that, for instance, "block 50 took 0.5 s in committee, block 51 fell back and took 5 s". If direct verification on Espresso testnet is tricky for the evaluator, our thorough documentation and possibly a reference to public stats (if Espresso publishes any) will back it up.

Note: By end of M2, we aim to have all data collection machinery in place and tested with short runs. The "initial data" is proof that everything works; extensive data will be gathered in Milestone 3. We will likely also provide visual proof like a few quick charts (for our own validation) and include those in docs, though the full analysis comes later. The Web3 Foundation can test any part of our toolchain – e.g., run the Polkadot script on a node for a minute and see data streaming, or use our Celestia tool on their own Celestia full node.

Milestone 3 – Comprehensive Data Collection & Preliminary Analysis

  • Estimated duration: 1 month (Month 3)
  • FTE: 3
  • Costs: $7,000 USD
NumberDeliverableSpecification
0a.LicensesAll continuing work remains under Apache 2.0 (code) and CC BY 4.0 (data). We will update any new code files with license headers and ensure the compiled dataset is labeled with the open license.
0b.DocumentationWe will expand our documentation to include a Data Description and Preliminary Findings. This will describe the final data sets collected (their format, scope) and provide user-friendly instructions on how to load and examine them. A tutorial might cover, for example, "How to run our Jupyter notebook to generate the graphs." We will also document any challenges encountered during full data collection and how we overcame them (e.g., if a node crashed during collection, how we adjusted).
0c.MethodologyWe will detail how we went from raw data to results. This includes data cleaning steps, analysis methods (e.g., statistical methods, any normalization). If we had to adjust our approach (say, collect over multiple shorter periods due to constraints), we will justify that and explain how it does not bias results. Essentially, this is the full "Data Analysis Procedure" methodology. It should allow an independent party to follow our reasoning and verify the computations.
0d.InfrastructureBy now, all relevant infrastructure should be fully available. We will finalize our Docker images (for analysis environment, etc.) and provide a list of all environments used. If any cloud compute was used for heavy data crunching, we'll describe it (though our analysis can likely be done on a personal computer given moderate data sizes). We will provide any scripts or config used to orchestrate the data collection (for instance, if we used Kubernetes or bash scripts to schedule tasks across the month). Essentially, infrastructure deliverables will ensure that the entire pipeline from data collection to analysis can be stood up by someone else if needed.
0e.Article (draft results)At this stage, we will produce a draft of the Results section of our research article. It will include initial graphs, tables, and interpretation of the data. This is not the final polished conclusion, but all core findings should be presented. We will write this in a clear manner, highlighting answers to the research questions. This draft will be nearly complete in content, though we might refine wording in the final milestone. It will contain all necessary figures (with captions, references to our data source). The article draft will be shared (likely as a PDF or Overleaf link, plus in the repo as Markdown or LaTeX).
1.Complete Data Sets for All PlatformsDeliverable: The full and finalized data sets gathered for Polkadot, Celestia, Avail, NEAR, Espresso (and any others). We anticipate data in the following forms:
- Polkadot/Kusama: A dataset covering at least multiple days (preferably a week or two) of parachain block info. Likely aggregated by parachain and by relay block. Fields might include timestamp, parachain ID, PoV size, number of validators assigned, number of availability votes, etc. Also possibly network throughput measurements (if available).
- Celestia: A dataset of block info for mainnet (or testnet if mainnet usage is low). Fields: block height, block size (in shares), number of samples we tried, success/fail of each sample (though likely we'll aggregate success probability per block), etc. Possibly covering thousands of blocks (~days of data).
- Avail: If mainnet is not live, testnet data. Fields: block height, size, any sampling result if applicable. If Avail wasn't stable enough to gather long data, we'll note that, but we aim for a significant sample.
- NEAR: Data on blobs submitted – for example, results of N test blobs (size vs cost vs time to inclusion) plus any relevant chain metrics over time if available.
- Espresso: Data from the testnet – block sequence with whether fast path or fallback, block sizes, latencies measured.

We will likely store these as CSV or JSON files in a structured directory. If file sizes are huge (several hundred MB), we will provide them via IPFS or cloud link with hashes, and include small excerpts in the repo for structure reference.

Verification: The evaluator will be able to inspect the data files. We will include README for data explaining how it was collected and any known gaps (e.g. "no data on day X due to node restart"). The data should reflect what our tools from M2 would produce. The evaluator can also attempt to reproduce a small portion (like run our collector for an hour and compare outputs to our data sample for that hour). The completeness of the data will show in our analysis, but this deliverable ensures raw evidence is accessible.
2.Analysis Scripts and Jupyter NotebooksDeliverable: All scripts or notebooks used to analyze the data and generate results. This includes:
- Parsing scripts to combine or clean raw data (if needed).
- Jupyter Notebooks (or R scripts) that load the data and produce charts/tables. We plan to use Python (pandas, matplotlib) for analysis, so likely a notebook per main section (e.g., analysis_bandwidth.ipynb for bandwidth charts, analysis_latency.ipynb for latency).
- If any statistical analysis or calculations are needed (like calculating probabilities, or checking a hypothesis), the code for that as well.

Verification: The evaluator can run these notebooks (we'll ensure they run using our Docker for consistency). The notebooks will be documented so it's clear which figure or result they correspond to. For example, a cell might output "Average bandwidth: X MB/hr" that we cite in the report. We will cross-verify that running the notebook on the provided data reproduces the figures in our draft report exactly. The evaluator doing the same should see identical outputs. The presence of these scripts also allows code review – they can see how we computed everything, ensuring no hidden steps.
3.Preliminary Findings & VisualizationsDeliverable: While the final polished conclusions come in Milestone 4, here we will deliver the concrete findings in a readily reviewable form. This includes the graphs, charts, and key quantitative results obtained:
- Graphs comparing metrics (e.g., a bar chart of average block size vs overhead for each network, a time-series plot of Polkadot vs Celestia bandwidth usage over a day, etc.).
- Tables of numerical comparisons (e.g., table of "Data availability performance metrics": Polkadot vs Celestia vs Avail with entries for bandwidth per MB, latency, proof size, etc.).
- Any computed values answering specific questions (for instance, "to achieve 99% confidence, Celestia light clients needed 60 samples on average for blocks of 2 MB", or "Polkadot validators on average transmitted 3x the parachain data size due to redundancy", etc.).

These will be delivered likely as part of the article draft (in 0e above) and also separately in the repository (like images saved in an analysis/figures/ directory, and a short PRELIMINARY_RESULTS.md summarizing them). We will highlight any surprising or notable insights even at this stage.

Verification: The evaluator can verify this by matching these results with the data and scripts. For example, if we claim in the findings "Celestia block throughput was ~2 MB/15s", the evaluator can look at our Celestia dataset and see the block sizes or run a quick average calculation themselves. Graphs will have source data references (either in caption or via a pointer to which dataset they used). The preliminary nature means we might still refine interpretation, but the numbers themselves are final and verifiable.

Overall, after Milestone 3, we will have essentially all the heavy lifting done: data is collected and analyzed. The Web3 Foundation can at this point see the outcome and gauge the success of the project in answering the questions. The remaining work will be mostly refinement, optional deliverables, and final write-up.

Milestone 4 – Final Report, Dissemination & Wrap-up

  • Estimated duration: 1 month (Month 4)
  • FTE: 3
  • Costs: $9,000 USD
NumberDeliverableSpecification
0a.Licenses & IPFinal confirmation that all project outputs are under the promised licenses (Apache 2.0 for code, CC BY 4.0 for documentation and data). We will do a final audit to ensure compliance (e.g., third-party libraries are appropriately cited, no proprietary data included). Copyright for the report will be shared under CC BY or CC0 as required.
0b.DocumentationAll documentation will be finalized. This includes a comprehensive Project README that explains the repository structure and how to use everything. We will also produce user documentation if we have interactive components (e.g. if we deliver a dashboard, how to access it, what each chart means). Any tutorial not covered yet (perhaps a tutorial video link, if we do one, or a demo of the dashboard usage) will be added. We will ensure the documentation is polished and understandable by others who may continue the work or verify it later.
0c.Methodology & ReproducibilityA final statement on reproducibility will be provided, summarizing how to reproduce the entire study results. We will include any last-minute methodological details or checks (like maybe we did a sanity cross-check using an alternate approach to verify a result – we'll document that here). If the grants team wishes to verify a specific deliverable, this section will guide them (for example: "To verify Figure 5, run analysis/latency_notebook.ipynb and see Section X in the output."). Essentially, it's a mapping between deliverables and verification steps. This will likely be a short document or an appendix in the report.
0d.InfrastructureDelivery of final infrastructure artifacts. We will publish our Docker images (e.g. on Docker Hub or as build instructions) – for example, a Docker image that contains all data and analysis code so that running it opens the Jupyter notebook ready to go (for ultimate ease of verification). If a Terraform script was prepared to set up nodes for future experiments, it will be finalized. We will list all environmental parameters (like versions of software) clearly. In summary, anyone in the future who wants to replicate the environment will have everything needed.
0e.Final Research ArticleWe will deliver the completed research paper/report in a publication-ready format. This will incorporate feedback from any reviewers on the draft (if any) and include:
- Abstract, Introduction, Background (from earlier milestones, refined),
- Methodology (overview of our approach),
- Results (graphs and interpretation),
- Discussion (what the results mean, any limitations),
- Conclusion (key takeaways, maybe recommendations for Polkadot or others),
- Acknowledgments (including the required W3F grant acknowledgment sentence),
- References/bibliography.

The report will be written in a professional and concise manner, suitable for submission to an academic workshop or posting on the W3F website. We will include all citations to connected sources (as footnotes or references in whichever format is appropriate). The length will likely be ~15-20 pages with figures. We will provide both the source (LaTeX/Markdown) and a PDF. The article will be the main artifact demonstrating our findings and will be understandable stand-alone.

Verification: The content of the article should reflect the data and analysis delivered. The grants team can verify factual claims by tracing them to our data or code (which we will cite in the article as needed, perhaps via footnotes linking to data files or code lines). The presence of the W3F acknowledgment confirms we included it. The article being complete and coherent is the verification of a successful culmination of the project.

In addition to the mandatory ones above, we propose the following additional deliverables to maximize impact (these are optional/outreach-oriented but we include them as part of our commitment):

NumberDeliverableSpecification
1.Replication & TutorialsDeliverable: We will create tutorials explaining our project and demonstrating how to reproduce a portion of it (for instance, showing how to run our data collection tool on a live network, or walking through the analysis notebook results). Additionally, we offer to host a live online session (webinar or community call) where we present the findings and answer questions. This deliverable ensures knowledge transfer beyond just writing.

Verification: We will provide a link to the video (uploaded on a platform like YouTube or IPFS) in our documentation. If a live session is done within the grant period, we will provide an announcement or summary; if scheduled just after, we will provide details for W3F to join/verify. The video content will clearly show our software in action, which indirectly verifies its functionality.
2.PublicationDeliverable: We will prepare at least one blog post or article for a wider audience summarizing our results (e.g. on Polkadot Forum or Medium). We will also prepare our work for academic dissemination: this includes perhaps submitting the paper to arXiv or a conference. While acceptance is beyond our control, the deliverable is the submission itself.

Verification: The blog post will be linked in our repo (and likely posted by the time of final delivery). For the academic submission, we will provide proof (like a submission confirmation or the arXiv ID). These materials ensure our work reaches the broader community and continues to add value beyond the grant.
3.Results DashboardDeliverable: An interactive dashboard / visualised finding where users can explore our results. For example, a DUNE dashboard or a web page (perhaps using Plotly Dash or Observable) where one can toggle between networks and view the corresponding metric charts, or adjust assumptions (like "if Polkadot had N parachains, estimated bandwidth is ..."). This will make our findings more accessible. We're yet to finalise our thinking on what the best way to achieve this could be - so for now we are keeping this optional.

(We include the optional items in the table to show our commitment, but we understand only the core deliverables are tied to grant evaluation. The optional ones are at no additional cost and are in line with our long-term plans.)

After Milestone 4, the project will be fully delivered, with all code, data, and documentation available for anyone interested. We will address any feedback from the W3F team promptly. Given the comprehensive nature, we expect verification to be straightforward via the provided notebooks and the final report.

Future Plans

Short term (next 3–6 months): Immediately after completing the grant, we will focus on disseminating and leveraging the results:

  • We plan to open-source all our tooling in a maintenance-friendly way. That means we'll clean up the repositories, write additional tests (if needed), and perhaps combine them into a single toolkit. We will publicize this toolkit to other blockchain researchers – it could become a basis for benchmarking other aspects or for monitoring networks long-term.
  • We will hold a community presentation. This could be at a Polkadot Meetup, an online seminar, or as a talk at the next Polkadot Decoded conference (if timing aligns). The goal is to explain our findings in person and encourage Q&A. We believe this is valuable for spreading the knowledge and possibly inspiring follow-up ideas.

Long term (beyond 6 months):

  • We aim to keep the project assets up-to-date. Data availability is a fast-evolving area; new solutions might emerge (e.g., Ethereum's full Danksharding, new L2 data layers). We plan to periodically (perhaps every 6–12 months) update the data or add new networks. Since all code is open, others can contribute too. In essence, we see this project as laying the groundwork for an ongoing "DA benchmark suite".
  • We will support and enhance the project if there is community interest. For example, if a parachain team wants to use our tools to simulate using Celestia as a backup DA for their chain, we could guide them on that (this might even spin out as a separate grant or collaboration, but with our results as a foundation).
  • Our team's long-term intention is to position Chainscore Labs as experts in blockchain performance and cryptography R&D. This project's results may lead us to new research questions (like, "how to further reduce DA bandwidth? or "can we integrate DAS into Polkadot light clients?"). We might pursue those as future grants or academic research. In particular, if our study shows a promising hybrid approach, we would be interested in prototyping it – perhaps a Substrate palette that enables Celestia-like sampling for parachains, or a scheme to use Avail as a Polkadot parachain.
  • We also plan to publish the work in a peer-reviewed setting (if not already done). A successful publication would increase the credibility and reach of Polkadot's research initiatives. It also opens doors for academic collaborations (e.g., with universities or other research labs).
  • Beyond the specific scope, our engagement with different communities (Polkadot, Celestia, NEAR, etc.) through this project sets the stage for multi-ecosystem collaboration. We intend to stay active in those communities, possibly contributing our knowledge or even patches (for example, if we identified a bug in Celestia's light node through our tests, we would report/fix it).

We will use the insights to advocate for evidence-based design in blockchain scaling. Ultimately, our long-term vision is to help drive Web3 towards a more scalable and interoperable future, and understanding data availability thoroughly is a key piece of that puzzle.

Additional Information

How did you hear about the Grants Program? Through the Web3 Foundation website, PBA job board announcement for this research task, and previous direct involvement (we are past grantees). We have been following the W3F Grants Program as part of the Polkadot community.

Previous grants: Yes, our team has received a previous Polkadot Open Source Grant:

  • "DotRing - Ring VRF Implementation" under Chainscore Private Limited 2025, was a research/technical grant where we built a Ring VRF in Python as defined in W3F research specification. It is under work and about to be published and publicly available. We mention this to demonstrate our positive track record with W3F. Link

We have not applied for other W3F grants besides that. We have not received funding from other ecosystems for overlapping work either.

Other contributions: We want to note that Polygon Labs, Avail, NEAR and Celestia are external projects – we have no financial relationship with them; We plan to keep our analysis impartial.

Finally, we'd like to acknowledge that this research aligns with the open spirit of Web3. Thus, beyond the formal outputs, we will engage openly: updating a research diary publicly (in our repo wiki or issues) so interested community members can follow progress, and inviting feedback along the way. We believe this transparency will enrich the quality of the final result and firmly ground it as a community asset.