Comment on page
Having the raw analytics data in a decentralized data store is great. It’s an open, permissionless, user owned, community asset available to everyone. Like blockchain data.
But, like blockchain data, it’s difficult to get insights from and build dashboards with a raw data format. You need a way to load it into traditional datastores for processing.
For now, data is stored in an S3 data lake in Apache Parquet format and accessed via AWS Athena. Apache Spark also supports S3 data lakes in parquet format and is another option for us as we scale. That said, Ceramic is working on a GraphQL interface. Once they have robust indexing and sufficient performance to support analytics queries, we will pull data directly from Ceramic instead of using S3.