Guide · Decentralized Storage

Decentralized Storage: IPFS, Filecoin, and Arweave Explained

Decentralized storage protocols replace location-based retrieval with content-based retrieval, replacing single points of failure with distributed networks of storage providers. Understanding the technical differences between IPFS, Filecoin, and Arweave is what determines which protocol is right for a given use case.

Content-addressed vs location-addressed storage

Traditional storage systems are location-addressed. When you fetch a file from a URL — https://example.com/document.pdf — you are asking a specific server at a specific address for a file. The URL says nothing about what the file contains; it only specifies where to look. This creates several problems: the server can go offline, the file can be changed (the URL stays the same but the content is different), and access is dependent on the continuing availability and willingness of the server operator.

Content-addressed storage takes the opposite approach. A file's address is derived from its content — specifically, from a cryptographic hash of the file's bytes. If you know the hash, you can verify that any copy of the file you receive is identical to the original. You can ask any node that has the file for it, not just a specific server. The file cannot be silently modified — any change produces a different hash and therefore a different address. And the file's availability is not dependent on any single party.

Content addressing is the technical foundation of IPFS, Filecoin, and Arweave. The three protocols differ significantly in how they incentivize storage, how they guarantee availability over time, and what economic models they use — but they all share the fundamental property that content is identified by what it is, not where it is stored.

IPFS: how CIDs work, pinning, and gateways

The InterPlanetary File System (IPFS) is a peer-to-peer hypermedia protocol designed by Protocol Labs. It provides the content-addressed networking layer — a global distributed hash table (DHT) that routes requests for content to peers that have it — but does not by itself provide storage guarantees. Understanding this distinction is critical.

How CIDs work

A Content Identifier (CID) is IPFS's address format. It encodes the cryptographic hash of the content along with metadata about how the hash was computed — the hash function used (typically SHA-256 or BLAKE2b) and the codec (how the content was serialized, typically dag-pb or raw). CIDv0 uses Base58-encoded SHA-256 multihashes and starts with "Qm". CIDv1 is more flexible, supports multiple hash functions, and starts with "b" in Base32 encoding. All new development should use CIDv1.

Files larger than the IPFS block size limit (256 KB) are split into blocks, each with its own CID. A Merkle DAG (Directed Acyclic Graph) is constructed with the block CIDs as leaves and a root CID that represents the complete file. This structure means you can verify the integrity of any individual block independently, and you can deduplicate storage across files that share identical blocks — relevant for versioned datasets where most blocks are unchanged between versions.

Pinning

IPFS nodes cache content they retrieve but do not guarantee indefinite storage. Garbage collection removes content that has not been recently accessed. To ensure a file remains available, it must be "pinned" — explicitly marked as content that should not be garbage collected. Pinning on your own IPFS node means the file is available as long as your node is online. For production availability, you use a pinning service.

Pinning services — Pinata, Web3.Storage (now storacha), NFT.Storage, Infura IPFS — run IPFS nodes and accept pin requests via API. You pay (or have a free tier) to pin content, and the service's nodes keep the content available regardless of whether you are running your own node. Pinning services are how most production applications use IPFS — the application adds content to IPFS and pins it via the pinning service's API without needing to operate IPFS infrastructure.

Gateways

IPFS gateways are HTTP servers that translate standard HTTP requests into IPFS requests, allowing regular web browsers and applications to access IPFS content without running an IPFS node. The public gateway at ipfs.io is operated by Protocol Labs. Cloudflare and Pinata also operate public gateways. Gateway URLs take the form https://ipfs.io/ipfs/[CID] or the subdomain form https://[CID].ipfs.dweb.link, which better respects same-origin security policies.

Public gateways are rate-limited and unreliable for production use. Applications that need reliable IPFS access use dedicated gateway services from their pinning provider or operate their own. The emerging standard is the Trustless Gateway specification, which allows clients to verify content integrity from any gateway without trusting the gateway operator.

Filecoin: storage deals, retrieval markets, and verified deals

Filecoin is the incentive layer built on top of IPFS. It is a blockchain-based marketplace where clients pay storage providers (miners) to store data for a specified period. The blockchain enforces that storage providers actually store the data by requiring them to periodically submit cryptographic proofs that the data is in their possession.

Storage deals

A storage deal is a contract between a client and a storage provider recorded on the Filecoin blockchain. The client specifies the CID of the data to be stored, the duration of storage (minimum 180 days, maximum 540 days per deal), the replication factor (how many providers should store copies), and the price per GiB per epoch they are willing to pay. The storage provider accepts the deal, stores the data, and proves continued storage by submitting Proof of Spacetime (PoSt) proofs to the blockchain on a regular schedule. If a provider fails to submit proofs, their staked collateral is slashed.

The Filecoin network currently stores over 1 exabyte of data. Most storage is done programmatically via the Lotus client, the Boost deal-making stack, or higher-level clients like Lighthouse and Estuary. Direct deal-making is complex; most applications use abstraction layers that handle provider discovery, deal negotiation, and deal monitoring.

Retrieval markets

Filecoin has separate markets for storage and retrieval. Retrieval deals involve a client requesting data from a retrieval provider — which may or may not be the same provider that stores it — and paying for the retrieval on a pay-per-byte basis using payment channels. In practice, retrieval from on-chain Filecoin storage deals can be slow (providers need to unseal sectors before retrieval in many cases) and expensive relative to traditional CDNs.

Most production deployments that use Filecoin for storage also use IPFS or a CDN for hot retrieval. The architecture is: store data permanently on Filecoin for durability and verifiability, cache frequently accessed data on IPFS nodes or a CDN for fast retrieval. This is the standard pattern for NFT metadata pipelines.

Verified deals and DataCap

Filecoin's Fil+ program (Filecoin Plus) allows clients to apply for DataCap — a subsidy that makes storage effectively free. Clients with DataCap can make verified deals, which give storage providers a 10x quality-adjusted power multiplier — a strong incentive for providers to prioritize verified deals. For large datasets with public benefit (open scientific data, historical archives, public datasets), applying for DataCap via an allocator is the standard path to free long-term storage on Filecoin.

Arweave: permanent storage model and AR token economics

Arweave takes a fundamentally different approach to storage economics than Filecoin. Rather than periodic storage payments and renewable deals, Arweave charges a single one-time fee at upload and guarantees permanent storage — forever. This is a bold claim backed by a specific economic model.

The permanent storage model

Arweave's blockchain (the "blockweave") stores data directly in blocks. When you upload data, you pay a one-time fee in AR tokens. A portion of that fee is held in an endowment — a storage endowment that earns yield as the endowment grows over time. The economic model assumes that storage costs will continue to decline (which has been empirically true for decades) and that the endowment will grow at a rate that keeps pace with future storage costs. If storage costs decline faster than anticipated, the endowment has more coverage than needed. If storage costs decline more slowly, the model has more risk.

AR token and miner incentives

Arweave miners (miners in Arweave's proof-of-access consensus) are rewarded with block rewards drawn from the endowment plus transaction fees. Miners compete to produce new blocks, but the consensus mechanism requires them to prove they have access to random historical blocks — creating an incentive to store the full history of the chain rather than just the latest blocks. This is the core mechanism that makes the permanent storage claim credible: miners are economically incentivized to maintain all historical data to participate in consensus.

Bundlr / Irys and the developer experience

Direct uploads to Arweave via the base protocol are slow and require AR tokens. Bundlr Network (now rebranded as Irys) is a Layer 2 service that bundles transactions together, dramatically increasing throughput and accepting payment in multiple currencies (ETH, MATIC, SOL, USDC). For developers, Irys provides a familiar SDK that accepts any major token for payment and confirms uploads in seconds rather than waiting for Arweave block finality. The ArDrive application provides a Google Drive-like interface for Arweave storage. Most NFT metadata and web archiving use cases access Arweave through these higher-level services rather than the base protocol.

Use cases: where decentralized storage fits

NFT Metadata

NFT metadata and images stored on centralized servers can be changed or deleted by the server operator — making the NFT's claimed properties mutable or inaccessible. The standard for serious NFT projects is to store metadata and images on IPFS or Arweave so that the CID stored in the smart contract provably maps to immutable content. Arweave is preferred for high-value NFTs where the permanence guarantee matters more than the cost savings.

Regulatory Archives

Financial services, healthcare, and legal organizations required to retain records for regulatory purposes face the challenge of proving that records have not been altered. Content-addressed storage provides a cryptographic proof of integrity at the protocol level. Arweave's permanence model is particularly compelling for compliance archiving — records uploaded once are provably retrievable indefinitely without ongoing storage management.

Data Availability Layers

Ethereum Layer 2 rollups need to make transaction data available for verification. Dedicated data availability layers — including those built on Celestia and EigenDA — use content-addressed storage principles to store calldata cheaply and verifiably. Filecoin is exploring its role in the modular blockchain data availability stack, where its large storage capacity and cryptographic proofs of storage could complement blockchain-native DA solutions.

Open Datasets and Archives

Research institutions and data organizations use Filecoin's Fil+ program to store large open datasets permanently at minimal cost. The Internet Archive, Wikipedia snapshots, scientific datasets, and climate data archives are among the types of data that benefit from decentralized storage — both for cost and for the resilience that comes from distributing data across many independent storage providers.

When to use decentralized vs centralized storage

Decentralized storage is not a replacement for centralized storage in all cases. The right choice depends on your requirements for latency, mutability, cost, and the trust assumptions you are making.

Choose decentralized storage when

You need content integrity guarantees — the ability to prove a file has not changed. You need censorship resistance — content that cannot be taken down by any single party. You need long-term archival without ongoing operational overhead. You are building on a blockchain and need your off-chain data to match the trust model of your on-chain data. You are storing NFT metadata, smart contract audit reports, or compliance records where mutability is unacceptable.

Choose centralized storage when

You need low-latency retrieval at scale — traditional CDNs (Cloudflare R2, AWS CloudFront, Fastly) consistently outperform decentralized retrieval for high-traffic applications. You need mutable content — files that change frequently, user-uploaded content with moderation requirements, or any data that needs to be deleted in compliance with privacy regulations. Your cost model is read-heavy and access patterns are predictable — centralized object storage (S3, R2, GCS) has well-understood pricing that is often cheaper for high-access workloads. You need fine-grained access control — decentralized storage has limited support for access permissions without encryption layers.

The hybrid architecture

Most production Web3 applications use a hybrid model: decentralized storage for the data that needs integrity and permanence guarantees (NFT metadata, audit logs, published documents), centralized object storage or CDN for the data that needs fast retrieval or mutability (user avatars, application assets, hot cache of frequently accessed content). The architectural question is not "decentralized or centralized" but "which data has which requirements and which storage layer serves each requirement best."

We build decentralized storage integrations.

IPFS pipelines, Filecoin storage deal automation, Arweave archiving infrastructure, and hybrid storage architectures. We have shipped decentralized storage systems at Protocol Labs and for clients building in the Web3 ecosystem.

Our storage services