Capital G

04 May 2021

Derek Zanutto: The Rise of Data Lakes

Image credits: Donald Iain Smith / Getty Images (via TechCrunch)

While data warehouses have evolved over the years as a way to store large amounts of structured data, more and more enterprises needed a way to store large amounts of unstructured data to fuel machine learning applications.

Enter data lakes.

Now, cloud vendors like Amazon and newer startups like CapitalG portfolio company Databricks have taken the data lakes concept to a new level, with new opportunities for startups and investors filling the quickly maturing data lakes ecosystem.

TechCrunch’s Ron Miller spoke with CapitalG general partner Derek Zanutto about the impact of cloud vendors in enterprise data lakes, how investors are approaching the space, and where the new opportunities and challenges lie.

Read the original article (paywalled) on TechCrunch

TechCrunch: Where are the opportunities for startups in the data lakes space with players like Snowflake and the cloud infrastructure vendors so firmly established?

Derek: The rise of the data lake model is creating a large, rapidly growing market opportunity adjacent to the more established data warehouse model typified by Snowflake and the big cloud vendors. The data lake model enables enterprises to unlock insights from a broader array of data (structured and unstructured) for a broader array of use cases (historical financial reporting and predictive AI analytics).

While the data lake offers many benefits for data-driven organizations, there are certain emerging challenges that will need to be addressed for the data lake model to further accelerate in the enterprise (for example, data reliability and highly performant querying). Companies that can build solutions to address these emerging pain points will have the opportunity to capture a large share of the significant profit pool being created around the data lake model.

TechCrunch: What are the biggest challenges for startups entering the data lake market right now and how do they overcome them?

Derek: Startups must help enterprises navigate what is essentially a market creation story by convincing buyers — who have spent the last 40 years procuring data warehouse technologies — to rethink their approach to data storage and analytics. Furthermore, the data lake category is intensively competitive — startups will often compete against well-entrenched data platforms who often own the underlying object storage infrastructures.

To succeed in the data lake market, startups will need to focus on product depth, not breadth. Companies that focus on building the “best of breed” technology for a core use case will have a better shot of scaling. From a go-to-market perspective, successful startups often sell solutions to specifically address tangible, real-world pain points (as opposed to broad technology solutions). Successful companies aim to secure small, paid pilots with real business impact and leverage those successful implementations as opportunities for future expansion.

Finally, open sourcing the underlying technology can be a highly successful way to gain both broad, bottoms-up distribution as well as mind share. Open sourcing also gives startups another opportunity for differentiation: It enables them to sell a multi and hybrid cloud story. This appeals to chief data officers who are increasingly looking to standardize onto open formats to give them the flexibility to leave the “walled gardens” of one cloud data platform for another.

TechCrunch: What impact do the big cloud vendors have on the data lake market with their offerings?

Derek: The big cloud vendors are developing and selling end-to-end ecosystems around their data lake offerings. Today they primarily offer the underlying object storage infrastructure on top of which data lakes are built, as well as data integration tools to move data into the data lake. They also offer a variety of data services, such as data science notebooks and federated SQL query engines, in order to make data stored in the data lake accessible for data consumers. Through their comprehensive platform approach, cloud players offer “one-stop shops” for all the data services enterprises need, often at compelling price points, since they’re able to bundle data services with core infrastructure spend.

Additionally, their services integrate seamlessly with other technologies in their cloud portfolios, providing what can be a highly compelling value proposition to many enterprises. As a result, the cloud vendors have had a tremendous impact on the data lake market. In order to compete with the cloud players, data lake companies must drive real product innovation and differentiation. In some categories, this could take the form of a verticalized solution that better addresses the pain points of particular industries and buyers.

TechCrunch: Beyond data lakes, there are lots of adjacent services with data governance, preparation, management and getting it in and out of the data lake? What kind of startup opportunity do you see in these adjacent markets?

Derek: There is tremendous opportunity for startups solving data governance and management challenges, whether or not they deal with data lakes. We’ve found through extensive conversations with chief data officers that their largest pain points center around data quality, data governance and time-to-insight.

In the area of data quality, one in three business users and consumers of data surveyed didn’t trust the data their teams were using. This is due to widespread inconsistency in data usage and terminology across teams, as well as a lack of a single source of truth. The lack of standardization creates inconsistent and inaccurate data models. These in turn lead to inconsistent and inaccurate model outputs that lead to imperfect business decisions that ultimately hurt the bottom line. Startups that help enterprises solve data quality challenges will be uniquely positioned to capture both interest and budget from investors and potential customers alike.

With regards to data governance, regulatory pressures from increasingly stringent legislation (e.g., GDPR, CCPA, etc.) have made the protection and privacy of consumer data a top priority within most global enterprises. Despite the prioritization, 80% of the chief data officers we surveyed do not consider themselves to be sufficiently GDPR compliant today. In fact, many enterprises lack the systems necessary to answer even the most basic questions of their data — what data they have, where it came from, where it has been, what it is being used for and how it may be impacted in the event of a data breach. Startups that help enterprises better understand their data history and potential impact are becoming increasingly mission-critical in most corporate board rooms.

Finally, with regard to time-to-insights, most enterprises are experiencing a massive proliferation of data coupled with significant fragmentation of that data across silos. These challenges make it difficult for consumers of data such as data analysts and data scientists to fully utilize the data they have and to drive it toward actionable insights. In fact, the business analysts we have surveyed spend on average 50% of their time simply looking for the right data for their analyses. We believe there’s a tremendous market opportunity for startups that will significantly reduce time-to-insight for enterprises; the winning startups will achieve this by democratizing self-serve access to data for all users, augmenting existing business intelligence workflows using no-code and NLP, and automatically surfacing insights to business users.

Special thanks to Mo Jomaa

More Insights

CapitalG invests in Duna, a fintech transforming how businesses verify their identity.

Read all

Derek Zanutto: The Rise of Data Lakes

More Insights

Duna: The Business Identity Network

Bedrock Robotics: Autonomy for the Built World

Lovable: Software Built for the 99%

Verkada: The Operating System for the Physical World

Physical Intelligence: Bringing AI Into the Physical World

Celero: Bringing Coherence to AI-Scale Networking

Tripling Down On Whatnot: Taking Live Commerce Mainstream

Base Power: Scaling the Grid

Baseten: The Foundation for Production AI

Clay: The Go-To-Market Platform for the AI Era

Tebi: The Operating System for Hospitality & Retail

nEye: Illuminating AI Datacenter Networking

Partnering with Pennylane

Expert Guidance for Growth: Introducing CapitalG’s 2025 Advisory Board

Celebrating CapitalG’s Newest General Partner, James Luo

NinjaOne: Empowering Modern IT Teams

Motif: A Revolution in Building Design

Reflections on 2024–and a Peek at What Lies Ahead

/dev/agents: The Operating System for AI Agents

Odoo: Rebundling Application Software

Farther: Ushering in a New Era of Wealth Management

From an Ironman Triathlete to the World Series of Poker: Meet CapitalG’s Investment VPs

Introducing the CapitalG 2024 Advisory Board

100 Top Women in Finance: Blazing New Trails in the Markets, the Economy, and Industry

Monzo: Making Money Work for Everyone

DTEX: Powering Your Trusted Workforce

CapitalG’s Next Generation: Alex Nichols, Mo Jomaa, and Jane Alexander

2024 Predictions: Reflecting Back and Looking Ahead

What to Know About Hiring in Tech Right Now, According to CapitalG's Lauren Illovsky

Heart and Hustle: A Conversation with Managing Partner Laela Sturdy

CapitalG’s Laela Sturdy Is Paving the Way for Women Leaders in VC

AlphaSense: The Emerging Leader in AI-Enabled Market Intelligence

Meet CapitalG’s 2023 Advisory Board

The Good, the Bad and the Ugly: The World of Data Is Changing

A New Era of Leadership for CapitalG

Google’s CapitalG Names Laela Sturdy as New Leader

Magic: Reimagining Software Engineering with AI

Meet CapitalG’s Newest Partners: Jill Chase and Chengpeng Mou

Looking Forward: CapitalG 2023 Predictions

Fulfilling Alphabet’s $100 Million Commitment To Black-Led VCs–and Economic Opportunity For All

Doubling Down on Whatnot: Making Online Shopping Fun and Social

Introducing CapitalG’s 2022 Go-to-Market Advisory Board

Mentorship: Building Talent and Unlocking Company Growth

How Lauren Illovsky’s Matchmaking Strengthens Talent Networks

How venture capital can close the racial wealth gap

Investing in What Matters: Funding Underserved Entrepreneurs

Salt Security: Protecting the Growing API Ecosystem

Promoting the Next Generation

Fireblocks: The Gatekeeper Unlocking Institutional Access to Crypto

Looking Forward: CapitalG’s 2022 Predictions

Doubling Down on Expel: Still Defending Customers from Cyber Criminals

Back in the Office and Covid-safe: Five Things to Know

Ensuring Growth Stage Success: Roundtable with Google Cloud

DCG: Building Web 3.0's Leading Financial Services Company

B2B Fintech’s Next Wave: Opportunities and Trends

Google Cloud's Adam Berlew on AI for Enterprise

Freshworks: Making History and Delighting Customers

Whatnot: Ecommerce + Entertainment and Social

How Financial Services Startups Can Create Trust

Google Cloud VP Carlos Granda: Reimagining Customer Success

How CapitalG Supports Investments Post-Money

Inclusive Hiring: Advice From Unqork, Netflix, Anaplan & More

Duolingo: From Early Monetization to IPO and Beyond

How Stripe Manages Remote & Hybrid Work and Culture

Google Cloud's Ulku Rowe Sees a Bright Future for Financial Services

What Laela Sturdy Learned on Her Path to VC

Congratulations to Robinhood, the Company That Created a New Generation of Investors

Full Truck Alliance: Revolutionizing China’s Massive Logistics Sector

How Duolingo Does Business Continuity Planning

Dr. Carole Robin: How Leaders Can Build Meaningful Connections

Meet CapitalG's 2021 Go-To-Market Advisory Board

Derek Zanutto: The Rise of Data Lakes

How Jill Greenberg Used Data for Her Path Into VC

UiPath: Accelerating Human Achievement

Mantl: Turbocharging Banks into the Digital Age

Our Investment in Notarize: Trust in Every Transaction

Our Investment in Cloud Security: Meet Orca

The Digital Identity Process is Broken. ID.me Will Fix It.