Data streaming is a powerful business intelligence technology for every data-driven organisation. However, storage and capacity constraints mean that organisations often stream just  ~1% of data for real-time analytics, while the other 99% remains untapped. It is typical that an organisation might also run a laborious duplication and ELT process to make data ‘operational’ for streaming.

This is why we are excited to back Streambased, a single home for data warehousing and streaming that stores and operationalises data into real-time streams. Founded by Tom Scott (Cloudera, Confluent, and Conduktor) and Leo Delmouly (Confluent, Cockroach Labs), Streambased productises a recent Apache Kafka step-change (Kafka is the underlying protocol on top of which streaming products are built, e.g. Confluent, Redpanda).

Event Streaming and Kafka are core to the values and technology stack at Streambased. The emergence of much greater storage in the stream and infinite retention on Kafka topics can create a changelog of everything that has happened in a business. It is a very real proposition for organisations to be able to use the same platform to effectively warehouse data and perform analytics instantaneously without the need for duplication and ELT.

Aiming to provide a powerful resource for Analysts, AI/ML engineers, and Data Scientists, Streambased uses indexing, statistics, and many other techniques to optimise nonsequential access to data (tldr; ‘select * from topic where X’ no longer has to read from the beginning to the end of the topic). Keeping the data at its source (Kafka) rather than duplicating it purely for accessibility, Streambased challenges the accepted practice of a datalake separate from Kafka. Instead, it uses Kafka’s storage and operational capabilities and imposes datalake-style access on top of it.

Tom Scott, CEO at Streambased highlights:
“Streambased is pioneering the streaming data lake, a transformational concept that breaks down barriers between operation and analytical data systems. We thank Seedcamp for their trust in this vision and the exciting new possibilities it creates.”

On why we invested, Will Bennett from our investment team comments:

“We feel incredibly lucky to Partner with Tom and Leo at the start of their mission to build the future of data engineering and analytics. In merging data streams and warehousing into a single platform, a Streaming Datalake has the potential to avoid mountains of ELT work. This has been a longstanding pipedream for analysts and we are excited by the timing of Kafka upgrades in the form of KIP-405 to make that possible. We can’t wait to work together with such an outstanding team that earned their chevrons at the likes of Confluent, Cockroach Labs, and Conduktor.”

We are excited to lead Streambased’s pre-seed round of $800K and be on this journey alongside First Momentum Ventures. This first investment will bolster Streambased’s product, enabling the team to strengthen its capabilities for market launch. It will also fuel awareness campaigns spotlighting the advantages of a streaming datalake and help the team establish their presence in the analytics market for event streaming data.

For more information, visit streambased.io.