Earlier this year, we invested in Rick and Yannick, founders of Orchest, a new open source tool for creating data science pipelines. The new wave of open source companies is something we’ve been following closely at Seedcamp and is a subject area we recently delved into over on the podcast. Kyran, from our Investment team, adds:
“From e-commerce through to financial services, healthcare and logistics, data science is becoming ever more mission-critical for businesses. Yet data scientists spend much of their time and energy on repetitive, largely infrastructure-related engineering tasks as opposed to what should be their core focus: training models and deriving valuable insights from them.
When we first met Rick and Yannick, we were massively excited by their mission to unburden data scientists from these tasks. As data scientists and engineers themselves, they had great empathy for the particular pain points here; what’s more, in their previous projects like Grid Studio, they showed a canny ability to foster early developer interest around what they were building. This convinced us they were a particularly well-equipped team when it came to open source strategy, and we’re delighted to support them on their journey.”
We sat down with CEO and co-founder, Rick Lamers, to explore more about Orchest; from ideation to what the future of data science looks like and why they decided open source was key in developing this new tool. Over to you Rick!
Where did the inspiration for Orchest come from and what specific problems are you trying to solve?
The ideas for Orchest developed while we were still in University. We were taking Computer Science courses about machine learning, distributed systems, and statistics, while at the same time applying these topics as part-time data scientists for large companies.
We noticed how many distracting engineering challenges came up while trying to do real-world data projects, such as training predictive models and making results and data available to data scientist colleagues and clients.
Ultimately, we concluded that much of data scientists’ workload should be offloaded by tools that take care of the standard, mundane and technical tasks, which they currently have to do themselves. Today, data scientists are often left reinventing the wheel, and from what we observed in practice these weren’t the best wheels either.
Can you walk us through your backstory and how you met?
In August 2019 I open-sourced a project that I had been working on over the previous summer called Grid Studio. A browser based spreadsheet application with the ability to easily make use of the Python programming language. To my surprise the project grew in popularity rapidly, gaining over 5000 GitHub stars in the first 21 days. The growth prompted interest from investors such a16z, Redpoint Ventures, and other prominent technology investors about what was next with the project, even though behind the scenes I was already working on a new and different project which would form the basis of Orchest.
I then visited the Bay Area and spoke to multiple investors, including angel investor Anthony Goldbloom (founder of Kaggle) about our new plans to start Orchest. After getting many supportive and encouraging reactions, Yannick and I decided it was the right time to drop out of the 2nd year of our master’s to start working on Orchest full time.
We come from different backgrounds but ended up meeting at TU Delft. Yannick got his BSc. in Mathematics from TU Delft. However, he always had a knack for programming as he was teaching Python to company employees during his undergrad already. Before coming to Delft, I studied at Erasmus University in Rotterdam. There I pursued a master’s in Entrepreneurship & New Business Venturing. I had been programming since age 14, and I knew I always wanted to deepen my CS fundamentals. After getting a University master degree I saw the opportunity to do a master in CS at TU Delft, at which I ended up getting accepted after having to plead my case for admittance due to coming from a, from their perspective, rather unusual background.
At TU Delft, Yannick and I often talked about math, CS, software, and startups. During our usual canteen lunch break we chatted about things we were up to, opportunities we saw and we generally started identifying areas of interest around software and starting a company.
Can you tell us more about your decision to open source the product and technology
If you look at the landscape of data science tools today, it’s predominantly open source. We believe that’s for a very good reason. Data science is firmly rooted in the scientific communities around the world. With the open mindset of academia and its proponents it makes sense that many great tools, libraries, and frameworks are made available open source. Many projects get their start in publicly funded research labs that are working on state of the art techniques. Take for example the terrific scikit-learn package. It was created by researchers in France working at Inria, a French research institute.
With Orchest, we wanted to respect the tradition of open source and we had great examples of companies that showed that you could combine open source software with a for-profit company. Influential examples are companies like Hashicorp, GitLab, Elastic, and Confluent. In addition, we are big fans of open source ourselves. From a developer perspective, it’s awesome to be able to quickly download and play with a piece of software to understand its use cases, design decisions and applicability. That’s the kind of experience we want to give to our users and customers too.
How have you geared up to officially launch Orchest to the world?
We prepared by first deciding which elements were critical to make sure we could launch. In the end, we figured that we’d need at least a website that would show professionalism and communicate the key features of the MVP. In addition, we needed to have these key features working in the MVP that when combined would already provide a useful tool to the early users of the product.
Looking back we think this worked out well because a few weeks ago we launched Orchest on the infamous Hacker News platform and received some great initial reactions. We even found out that someone had hunted us on Product Hunt unexpectedly and that we had reached the front page on there too.
Although the launch was really helpful in getting some early feedback from potential users and finding companies that are interested in being pilot users of the managed cloud version, we very much feel like we’re just getting started. The open-source MVP is starting to take shape, and more and more valuable use cases are being unlocked every time we commit our latest changes to the GitHub repository.
What is next on the horizon for Orchest?
We are looking for awesome people to join us on our mission to build the world’s best tools for data scientists and their teams. At the same time, we want to engage companies that are looking to leverage data and the PyData/R stack to accelerate our product development in exactly the right direction. They will benefit from feature development directly aimed at their needs, while we learn which valuable use cases are most prevalent and how we can support those even better with Orchest. If you are interested in trying out the managed cloud version you can sign up for early access through this Typeform.
What do you think the future of data science looks like?
Data science will move from laptops to the cloud. No company wants their sensitive data stored directly on employees laptops. Cloud computing is just a vastly superior mode of operation, due to its removal of non-core activities from companies and inherently more secure and scalable properties.
Furthermore, we believe we will continue to see a lot of innovation happening in open source tools & frameworks, and data science teams need to be able to leverage the collective innovation that is being spearheaded by great initiatives such as Apache Arrow/Kafka/Cassandra/Hadoop/Spark, PyTorch/Tensorflow, PyData/R ecosystem, etc.
In addition, we are big believers in the power of interactive computing and the direct feedback model that enables an experimental workflow that leads to great discoveries in data and solutions that actually contribute to the bottom line. The phenomenal success of projects like Jupyter and their interactive notebooks reaffirm this. Hence, we decided to make notebooks first class citizens in Orchest’s data science pipelines, and we will continue to develop features that make building and running data pipelines simpler for data scientists.
What brought you to Seedcamp?
Seedcamp has been a leading seed investor in Europe for quite a while now. You were on my radar many many years before we ever had any contact. I just checked my inbox and saw I subscribed to Seedcamp’s newsletter back in April 2015. I’ve always looked at the companies, events, and blog posts from a distance. Trying to understand startups and venture capital and generally just observing all the awesome technology companies they are investing in. Understanding what these companies are doing, and playing with their cool products of course.
Serendipitously, we connected during our recent fundraise through a great introduction that was made by another investor who saw a potential match. Our initial call with Kyran was great, he asked all the right questions and really got what we were trying to do with Orchest.
Any advice for founders starting their business now?
Keep believing in what you’re doing. For years on end, you and perhaps a handful of others are the only ones that are seeing the opportunity and have the belief in what you’re trying to do. It’s critical to not get discouraged. Over time, people will start seeing merit once you’ve had time to develop your ideas and implement them in one form or another.
Try to find the balance between your own convictions and what the market is telling you. They won’t be able to tell you what to build or how to build your product. But they surely will tell you whether they like it or not, and whether it solves their problems for them. That being said, it’s really critical to ‘find your crowd’. Identify the communities that are most interested in what you’re doing because they care deeply about the topics involved. They can be great evangelists, early users, customers, etc. A great marketing perspective on this comes from Seth Godin talking about the concept of Otaku and the value of making something remarkable.
Lastly, look for people that can help you. Building a scalable company is incredibly difficult. It requires skill, patience, luck, capital, and a small army of phenomenal individuals. Even when you have all of that, things can still go wrong. There’s no reason not to accept all the help you can get, as people have been in similar situations before and their advice can help a lot. It’s also great to have people rooting for you when times are tough.
Just do it!