AI is disrupting SaaS. But who owns the underlying data?

By Vivek Bhaskaran | May 4, 2026

Reading time: 3 minutes

Saas Ai Underlying Data Ownership Insights Industry Research

Abstract

As AI continues to disrupt SaaS, Vivek Bhaskaran poses the question: What if your data provider used synthetic data trained on responses from people who never consented to that use?

Research Topics:: Artificial Intelligence / AI | Data Analysis | Data Security | Marketing Research-General | The Business of Research
Industry/Market Focus:: Research Industry
Content Type: Research Industry Voices

Share Print

Listen to this article

Editor’s note: Vivek Bhaskaran is the co-founder at QuestionPro, with over 20+ years of enterprise software. Find Bhaskaran on LinkedIn.

Enterprise software has long operated on a straightforward bargain. Companies entrust vendors with business processes and data so that the vendors can deliver value back to that same customer. Customers expect their data to be used to improve their own operations, their own decisions and their own outcomes.

What customers do not expect is for that same data – or assets derived from it – to quietly become part of a product, model or capability that creates value somewhere else.

That is the principle at the center of the current debate around synthetic data practices.

What if your data provider used synthetic data trained on responses from people who never consented to that use?

Synthetic data capabilities are often described as being built on large-scale datasets, sometimes in the tens of millions of data points. That immediately raises a basic customer question: Where did the data points come from, and what rights govern their use?

In my view, customers reasonably expect the data they collect through a platform to be used on their behalf, not repurposed beyond that relationship without clear permission as defined in the underlying agreement. Thus, the debate is not whether synthetic data is useful. The debate is whether customer-contributed data can be used to build capabilities that create value beyond the original customer relationship and whether that use is clearly addressed in contract language.

This isn’t a narrow debate about boilerplate Terms of Service. It is asking the simple, more practical question: Did customers clearly understand and affirmatively agree to this kind of secondary use?

Contract language matters. So do notice, clarity and informed consent. If customers did not clearly understand that their data could contribute to capabilities offered beyond their own account, then this is not just a technical or legal issue. It is a trust and governance issue.

And that matters even more now because, in the AI market, proprietary data has become a strategic asset.

First-party data increasingly determines how differentiated, defensible and commercially valuable AI offerings can become. That is why companies spend heavily to collect, manage, store and analyze customer data in the first place. The value is not just in reporting. It is in the models, benchmarks, decisions and products that data can power.

That is also why this issue should not be minimized.

Most customers accept that vendors will use aggregated or anonymized data to maintain, secure and improve the services those customers themselves receive. That is part of the normal SaaS bargain. But many customers do not assume that the same data will be transformed into a separate commercial asset.

I have had concerns about this issue for some time, but the recent public discussion pushed me to state them plainly. In the AI era, trust is no longer only about uptime, security, scalability and features. It is also about whether your vendor uses your data to work for you – or to create value somewhere else.

That is why this matters. And that is why every enterprise customer should demand a clear answer.