Indexer

Having the raw analytics data in a decentralized data store is great. It’s an open, permissionless, user owned, community asset available to everyone. Like blockchain data.

But, like blockchain data, it’s difficult to get insights from and build dashboards with a raw data format. You need a way to load it into traditional datastores for processing.

To address this, we created an indexerarrow-up-right for the data. It's an automated pipeline that uses the Airbytearrow-up-right open source ELT platform and pushes normalized data directly into an S3 data lake. Our source connector continuously monitors the blockchain and Ceramic for new apps, users and data for indexing.

For now, data is stored in an S3 data lake in Apache Parquetarrow-up-right format and accessed via AWS Athenaarrow-up-right. Apache Sparkarrow-up-right also supports S3 data lakes in parquet format and is another option for us as we scale. That said, Ceramic is working on a GraphQL interface. Once they have robust indexing and sufficient performance to support analytics queries, we will pull data directly from Ceramic instead of using S3.

Last updated