
An adult female mosquito is seen under a microscope
Getty Images
Imagine pricing a product that spreads on its own and benefits people who never even touch it. That might sound like science fiction, but it’s remarkably similar to the story of health data and, strangely enough, to a bacteria called Wolbachia.
Wolbachia is an insect-borne bacterium that, when introduced into mosquito populations, prevents the transmission of diseases like dengue, Zika, and chikungunya. What’s remarkable is that once a few mosquitoes are infected, the bacteria spread on their own through reproduction, benefiting entire ecosystems without the need for continual intervention. The Wolbachia program is often cited as a real-world example of positive externalities: situations where a decision benefits people who didn’t make the decision or pay for the intervention.
In many ways, health data, especially when collected through electronic health records (EHRs), functions similarly. When the federal government incentivized the adoption of EHRs (via the HITECH Act) in the wake of the 2009 financial crisis, the goal was to digitize medicine and create a “learning healthcare system.” The goal of digitizing medical records has largely been achieved. But the byproduct has been the emergence of a secondary market for deidentified health data, one where hospitals, health systems, and data aggregators now find themselves stewards of a valuable asset they never intended to commercialize.
Robust deidentified clinical data holds the promise to dramatically improve the way researchers conduct studies, life sciences companies recruit for and conduct clinical trials, and even to facilitate faster regulatory approval for new indications of use for existing therapies such as GLP-1 drugs.
And yet, despite the excitement surrounding this rapidly expanding market, few people can explain how health data is actually priced. The absence of such market signals may inhibit the type of investment required to build the infrastructure that will enable a brighter future.
This article aims to provide clarity for providers and other data holders looking to understand what their data is worth and perhaps more importantly, what the revenue potential is. Not in theory, but in practice.
The Difference Between Value and Price
A common misstep in the discourse about health data is the conflation of value and price. News articles often cite astronomical per-record figures, but these typically reflect the valuation of an entire business, not the transactional price of a data record.
Consider Roche’s $1.9 billion acquisition of Flatiron Health in 2018. Some analysts at the time estimated the deal implied a value of $950 per oncology patient record. But that figure is misleading if interpreted as a direct price for data. The valuation took into account Flatiron’s technology stack, intellectual property, customer contracts, management team, and brand. It also reflected the company’s ability to generate revenue over time by licensing its data repeatedly across multiple clients and use cases.
A similar misunderstanding occurred when Silver Lake and GIC invested $4.7 billion in Ancestry, a genealogy company with a large database of genotype records. Based on the number of users, some calculated a value of $1,733 per record. But again, that ‘price’ factored in the company’s platform, brand equity, infrastructure, and growth potential—not the standalone value of an individual data point.
These figures illustrate how high valuations reflect complex business dynamics, not just raw data. Mistaking them for prices can skew expectations for health systems or data aggregators attempting to gauge the worth of their records.
Demand-Side Drivers of Price
To understand real-world pricing, it’s critical to examine demand. Not all buyers are equal, and the price they’re willing to pay varies dramatically based on who they are and what they need the data for.
According to research by L.E.K. Consulting, pharmaceutical companies tend to have the highest willingness to pay for clinical data, sometimes reaching upwards of $1,000 per patient record. That’s because they use real-world data to support regulatory filings, evaluate treatment outcomes, and identify patient cohorts for clinical trials — all high-stakes applications with substantial ROI for companies whose commercial products achieve 80% gross margins.
Medical device companies follow, with reported willingness to pay around ~$800 per record, especially when conducting post-market surveillance or comparative effectiveness research. Health insurers, by contrast, typically operate on thinner margins and lower per-project value, and thus tend to pay much less—often closer to $80 per record.
Even within a given buyer category, the specific use case can significantly influence price. A pharmaceutical company exploring a niche oncology treatment may be willing to pay a premium for highly curated data, while another conducting early-stage research may look for lower-cost, exploratory datasets. The expected utility—that is, the insights and advantages a buyer hopes to extract—largely dictates the price ceiling.
Further, willingness to pay can be dependent on data type. EHR data alone may be valued at ~$70, while genomic data may be valued at ~$1,000+ per record (both according to the L.E.K. research).
Supply-Side Considerations for Pricing
On the supply side, health systems and aggregators should understand that several structural factors influence how much their data is worth to buyers.
The first is population mix. A community hospital serving a broad population will have a high volume of data, but much of it may relate to common conditions with relatively lower demand. In contrast, a specialty cancer center or academic hospital with a focus on rare diseases will likely possess records that are more valuable for targeted research, especially in areas like precision oncology.
Second, the longitudinality of the data is important. Health systems that adopted EHRs early and maintained consistent digital records over many years can offer richer, more continuous datasets. These are especially attractive for understanding long-term outcomes or disease progression.
Third, participation in health information exchanges and information-sharing arrangements can enhance dataset value. Systems that integrate data from pharmacies, labs, imaging centers, and outside providers can offer more complete pictures of patient health, making their data more useful and reliable for researchers and analytics firms. This is especially the case with emerging technologies that enable the digitization of genomic, pathology and other data types.
A fourth consideration is to what extent data is structured or unstructured. “Nearly 80% of data relevant to research is hidden in unstructured notes,“ explains Truveta CEO Terry Myerson. Truveta is a health data platform spun out of Providence Health System that has grown its membership to more than 900 hospitals, and points to its Truveta Language Model as unique in its ability to extract relevant insights from unstructured data.
Myerson also pointed to the importance of linking EHR and clinical data with “closed claims data”. The concept of closed claims data involves the ability to track a patient’s journey over time, based on a more comprehensive (if less robust) view provided by health insurance claims across providers, procedures and treatments that the health plan has paid.
Finally, the presence of robust data governance and analytics infrastructure can boost both the quality and perceived value of a dataset. Health systems that invest in standardizing, cleaning, and curating their data—and that have in-house teams capable of supporting research partnerships—are better positioned to command higher prices.
How Pricing Happens in Practice
Despite growing demand and increasing sophistication on both sides of the market, the process of pricing health data in the real world remains surprisingly informal. The answer to the question “What is a health record worth?” is easy: it depends on many, many factors.
Beyond the considerations above, the reality is that both sides of the market (demand and supply) are still messy and emerging. Different hospital systems that contribute data to the same data platform not only have different population mixes, but will have different approaches to clinical systems and managing data quality. One may participate in a health information exchange, so it has data from outside its own providers, while the other does not.
Not all data is equal.
Nor are the capabilities of the data platforms themselves. Briya, a health data platform founded in 2020, touts its AI and natural language processing as being key to enabling research that requires investigating physician notes. The company also has a novel approach to data security, as cofounder and CEO David Lazerson explains, “Our approach to data governance ensures that all our users retain control and transparency of the use and management of their data. Healthcare organizations can keep their data on-site, maintaining full control to grant access to researchers as required.”
The upside of Briya’s approach is clear for provider customers: control of decisions regarding who they share their data with, under what conditions, and for what purposes. One can imagine how this may create complications for a data platform like Briya: if providers provide approval on a case-by-case basis, this could introduce uncertainty around data availability and may lengthen or inhibit the data platform’s sales cycles with life sciences companies.
As a result of these nuances, most deals for deidentified data are structured on a case-by-case basis, negotiated individually between data providers and data consumers.
Some data brokers and providers establish minimum purchase thresholds or charge “platform fees” that account for the technology and services they bundle with data access, such as analytics dashboards, onboarding support, or API integrations. But the heart of most deals comes down to a negotiated price based on multiple inputs, including the number of records involved, the types of data being shared (e.g., labs, imaging, prescriptions, clinical notes), and the disease or condition focus of the dataset.
However, intangible considerations often play an outsized role. The strength of the relationship between buyer and seller can substantially sway pricing, as can both sides’ awareness of alternative options for the same or similar data. Industry experts routinely note that many deals are shaped as much by a “gut sense” of the buyer’s willingness to pay as by any formal valuation model.
The result is a highly variable and opaque pricing environment. While buyers may have internal thresholds for acceptable cost-per-record, and sellers may track historical benchmarks, there is no widely accepted method or marketplace standard for valuing a single patient record. Until such norms emerge, pricing will likely continue to reflect a complex blend of tangible metrics and relationship dynamics.
Trends Reshaping the Health Data Market
The market for health data is not static. Both supply and demand are being reshaped by broader policy and technology trends.
On the supply side, regulations like the 21st Century Cures Act and the ONC’s information blocking rules have made it easier for patients to access and share their health data. APIs and interoperability standards like FHIR are making data more portable, while state laws like the California Consumer Privacy Act (CCPA) and global frameworks like GDPR are imposing new requirements on data use and disclosure.
These regulatory changes are expanding the available supply of data, but also raising the bar for compliance, especially when it comes to de-identification and patient consent. At the same time, hospitals and other providers are increasingly exploring ways to monetize their data in alignment with these new rules.
On the demand side, the appetite for clinical data is growing. Pharmaceutical and life sciences companies now incorporate real-world data into virtually every stage of product development, from discovery to post-market evaluation. Researchers and AI companies are seeking large-scale datasets to train machine learning models. And as healthcare continues to shift toward value-based care, stakeholders need better data to measure outcomes, allocate resources, and predict risk.
Precision medicine is another major driver. The more granular and personalized our understanding of disease becomes, the more valuable curated clinical data sets will be. This will likely increase the price of well-structured datasets from specialty providers.
Conclusion: Pricing the Byproduct
Much like Wolbachia spreading through a mosquito population, health data has expanded beyond its original purpose. What began as a digital record to support and coordinate clinical care has become a monetizable asset with growing strategic importance. But unlike most commodities, its price is anything but transparent.
For health systems and aggregators, understanding how health data is priced requires a clear-eyed view of both sides of the market. Value and price are not synonymous. Demand is driven by project-specific use cases and organizational priorities. Supply-side factors such as data completeness, quality, and population focus all shape the perceived value of a dataset. And deidentified health data appears subject to normal rules of economics: as supply increases, price seems to decrease.
In practice, most deals come together through one-on-one negotiations, shaped by relationships, project urgency, perceived alternatives, and educated guesses. There is no universally accepted price tag for a patient record. Pricing health data remains as much art as science.
As the market matures, pricing will likely become more standardized and transparent. Transparent models will be built, helping all sides – data brokers, sources and buyers – understand the inputs and levers that influence both price and revenue generation. But for now, the best approach for sellers is to understand what buyers actually need, evaluate the quality and uniqueness of their own data, and align internal governance and analytics capabilities to support those needs.
The result isn’t just a higher price—it’s a more strategic role in shaping the future of healthcare.



:max_bytes(150000):strip_icc()/Health-GettyImages-1496954269-9c57e3acfd444a7f8874682fdb2db2b7.jpg?w=160&resize=160,100&ssl=1)



:max_bytes(150000):strip_icc()/HDC-GettyImages-668641904-9179dc9fe60446d8b4d8a08fbffcf46d.jpg?w=600&resize=600,400&ssl=1)



Recent Comments