Lightweight and Flexible Data Access for Algorand

3y ago•

bullish:

bearish:

Algorand has released a new tool for blockchain data access: Conduit. Conduit is a modular plugin-based tool and a powerful upgrade to the one-size-fits-all Indexer. Conduit allows dapps to get exactly the data they need in an affordable deployment.

Useful, but bulky: the Indexer

The Indexer is a ready-to-go open-source tool that pulls the data from the blockchain, stores it in a database, and offers an API to serve that data. The existence of the Indexer has been a significant boon for the Algorand ecosystem, allowing anybody to easily read the Algorand blockchain.

However, the Indexer has historically had one major drawback: it is expensive to run. There are two main reasons for this:

Running an Indexer requires also running an archival node that stores every block since the beginning of the blockchain.
The Indexer collects the entire blockchain history (every transaction since block zero) in a Postgres database.

These facts make the Indexer a multi-Terabyte deployment. A typical Indexer requires a number of expensive resources, and these multiply for production deployments needing redundancy, load-balancing, and covering multiple regions.

The scale of the Indexer also makes it slow to initialize, and only capable of serving the specific queries for which it is indexed. As the Algorand blockchain has grown, it has become impractical for smaller projects to maintain their own Indexers.

Consequently, the ecosystem mostly relies on a few API/data providers. These providers run Indexers and charge dapps for their API calls. This is more economical and practical than each group running their own Indexer, but it presents other inflexibilities.

Dapps should have an accessible option to own their own data access infrastructure. This is what Conduit was built for.

Conduit, the Basics

Conduit is a new solution with several major advantages:

Conduit does not require running an archival algod node.
Conduit lets users filter incoming blockchain data, allowing them to collect strictly the data they need for their applications.
Conduit offers a data pruning feature that allows users to automatically delete old transactions when pruning is enabled.
With Conduit, users can build custom data exporters that use the data destination of their choice.
Conduit is designed as an extensible plugin architecture. Any community-contributed plugin can be integrated by anyone.

Conduit allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.

A Conduit pipeline is composed of an importer, optional processor(s), and exporter plugins. Along with the Conduit release, the following noteworthy plugins are made available.

Algod importer — fetches blocks from an algod REST API.
Filter processor — filters data based on transaction fields.
Postgres exporter — writes the data to a Postgres database.
File writer exporter — writes the data to a file.

Configuring a Conduit pipeline requires defining which plugins to use, and if necessary, configuring the plugins. For example, the filter processor requires a definition of what to filter.

This is best demonstrated with an example. See a basic walkthrough here.

Conduit’s Filter Processor

The filter processor is a key new feature introduced with Conduit. It allows users to filter the transaction data based on any transaction field — transaction type, app ID, asset ID, sender, receiver, amount, etc. These filters can also be combined.

Since many transactions are submitted as grouped transactions, the filter processor allows users to choose whether or not to include the entire transaction group when the filter conditions are met.

The filter processor will always include inner transactions for transactions that match the specified filter conditions.

Full details on the filter processor are here.

A New Node Configuration for Conduit: Follow Mode

Conduit is used to track data from the blockchain and make it available to the off-chain world. Every time a new block is created on-chain, Conduit is informed about every change to every piece of state since the prior block, such as new accounts created, app states updated, boxes deleted, etc.

Some dapps use an object called the ApplyData to track some kinds of state changes, however this approach is technically limited. Not all changes are reflected in this object, and ApplyData are only cached for 4 rounds on non-archival nodes, meaning that delayed handling of ApplyData updates for more than 15 or so seconds will result in an unrecoverable state error.

The old Indexer architecture solved these challenges by requiring access to an archival algod node. Indexer used a “local ledger” to track the state changes from round to round, and thus avoided the incomplete ApplyData object. The drawback of this design is the need for an expensive archival node.

Conduit instead requires access to a node in a new lightweight “follow mode” configuration which replaces the need for the archival configuration. Conduit can pause and unpause this node’s round updates as required. The pause functionality ensures that the Conduit process will not miss out on any blockchain state updates. Conduit also makes use of a new “state delta” endpoint introduced in the node to eliminate the requirement for a large local ledger.

A node with follow mode enabled cannot participate in consensus, as votes based on paused state information would be rejected. Similarly, submitting transactions to such a node is not possible, as acceptance based on paused, outdated state information might be judged invalid by the rest of the blockchain.

Conduit as an Extensible Tool

Focusing on open-source principles and decentralization, Conduit’s design encourages custom-built solutions, setting it apart from the Indexer. In our initial release, we encourage new plugin submissions via PRs to the Conduit repository. We aim for the plugin framework to inspire community involvement, allowing everyone to benefit from shared efforts. Currently, we’re engaging the community to identify optimal management for externally-supported plugins long-term (join the conversation on Discord #conduit channel!)

We have already seen the development of a Kafka plugin by a community member (Iridium#4127 on Discord), who has this to say about Conduit:

“… it [Conduit] allows [you] to choose your … targeted product (e.g. Kafka) to quickly build a plugin and let the data flow. Mainly it’s just importing the correct library — configure your connection and use the library to send messages to your system. Receiving is already handled by Conduit.”

Comparing Deployments: Legacy Indexer vs. Conduit Architecture

Indexer, legacy architecture

Requires an archival algod node, which requires at least 1.1 TB of storage.
Requires a Postgres database with full historical data, or 1.5 TB of storage.

Source for the above: howbigisalgorand.com

Conduit architecture

Requires a node with “follow mode” enabled, which requires 40 GB of storage (like other non-archival nodes).
Conduit can use a Postgres database, or a different data store. The user can store full historical data, or a subset. This is at most 1.5 TB if storing the full history, and could be as little as a few GB.

The costs of these deployments will vary depending on whether users are self-hosted or using cloud providers (and vary greatly by provider). However, the storage costs will be strictly less for a Conduit-backed deployment.

Note that storage will likely be the major cost factor, and bandwidth and compute requirements are similar across both architectures.

Continued Indexer Support

We are continuing to support the existing releases of Indexer which run its old architecture (using the archival node) at this time. If users would like to continue using the Indexer but also want to save costs by removing the need for an archival node, they have the option to run an Indexer backed by Conduit. The Indexer interface remains the same. See our migration guide here.

Conduit Builds Better Apps

Conduit was designed to be flexible and extensible, intended to allow developers to build whatever data solution fits their needs. As such, Conduit has countless applications.

Want to run Conduit to support your dapp reporting needs?

Want to extend the Indexer API?

Want to power an event-driven system based on on-chain events?

Want to scale your API Provider service by using CockroachDB?

Want to dump everything to S3 and just query that?

The limitations imposed by the Indexer’s rigidity no longer apply. While Conduit doesn’t provide everything for free, it offers users the flexibility to build what they need.

Lightweight and Flexible Data Access for Algorand was originally published in Algorand on Medium, where people are continuing the conversation by highlighting and responding to this story.

3y ago•

Algorand

bullish:

bearish: