Forbes | May 23, 2019
By David Shaywitz

While pharma C-suite executives find themselves increasingly seduced by the promise of “digital transformation,” and especially by the idea of leveraging AI, the lived, on the ground reality within virtually all pharma R&D organizations couldn’t be further removed.


Data: Asset or Liability?

As Eric Perakslis, a health data guru and Rubenstein Fellow at Duke (his Tech Tonics here), explains, data in both health systems and clinical trials (and, I’d add, other biopharma research) are generally collected for extremely specific purposes. In clinical encounters, the data is logged into an EHR, and used for “building a longitudinal history and tallying up procedures for billing.” In clinical trials, “you are filling in boxes in the physical implementation of a clinical protocol, a statistics database.”  Both, Perakslis says, are forms of data management where the goal is efficiency, compliance and effective sausage making.

The challenge, he continues is:

“Learning anything secondary, also known as knowledge management, from either is usually an afterthought and requires data to be extracted, reformatted and re-homed into an additional structure such as a data warehouse/mart/lake.  This requires additional labor and additional risk as duplicate data amplifies cost as well as compliance, security and privacy risks.  For these reasons, it is seldom done unless funded and prioritized via leadership.  This can be messy and expensive to bean counters but, I’d argue, should be prioritized, because if done correctly, data should be the second most valuable asset, after talent, in any scientific organization.”

Like Abernethy, Peraklisis acknowledges the effort involved in actually arriving at usable data:

“Given the archaic infrastructure of most large institutions data curation, cleaning and transformation amounts to manual hand-to-hand combat between highly educated humans and text interfaces.  It takes forever, costs a fortune and simply should be avoided.  People should think things through from day one and know that it makes more sense to lay plumbing and conduit for the anticipated addition when you design the house, not after.”

Adds Perakslis, “randomized control trials cost anywhere from $30,000-$50,000 or more per patient….  For less than $1000 more (per patient), you could create data files in a modern lake ready for AI, ML or human mining.  Why aren’t we all spending the extra two cents per sausage even if our primary job is sausage making? Treating data as an asset versus a liability should be the key.“  (I’ve also discussed the contrast between the positive optionality tech companies see in data vs the negative optionality many biopharma companies perceive — see here.)