COO, President and Co-Founder,

Unstructured data – such as medical images, scans and lab reports – opens the door to a wide range of exciting new possibilities in the world of health care. By deriving insights from unstructured data, health care organizations can deliver better patient care, as well as unlock powerful capabilities – like using AI models to streamline medical note-taking, reshaping hospital billing and improving medical coding.

To do these things, however, you need to be able to track and access all your data. This can be challenging, given that unstructured data comes in many forms and is often scattered across disparate locations. But with the right strategy and tools in place, leveraging unstructured data to power health care innovations is possible.

Unstructured Data and Why It’s Important in Health Care

Unstructured data consists of any data or files that may be organized using folders and directories and is not housed in a database where it can be systematically tracked, secured and protected. Common examples of unstructured data in health care include:

  • X-rays, MRIs and CT scan images.
  • Text documents containing medical notes or patient evaluations.
  • Lab reports.
  • Audio recordings taken during evaluations of patients.

Health care providers routinely need to access data assets like these to deliver quality care to their patients. In addition, researchers may analyze unstructured health care data to help answer questions like how effective a given procedure is in mitigating the impact of an illness, or how many patients respond effectively to a new medication.

In addition, the ongoing AI revolution is opening varied opportunities for leveraging unstructured health care data to train AI tools and services. Most AI tools and services work by scanning large volumes of data. The more data you feed them and the more representative that data is of the real-world conditions you want your AI solutions to understand, the more effective they will be. To take full advantage of AI, being able to access and train on all relevant unstructured data is paramount.

The Challenges of Managing Unstructured Health Care Data

Unfortunately, ensuring that providers and researchers can access data quickly is often deeply challenging, for several reasons.

One is the sheer size of some types of health care data. Text files are typically small, but files such as X-ray and CT images can consume as much as 30 megabytes each. If a facility takes just a few dozen images each day, they quickly add up to fill many gigabytes’ worth of space each month. Specialized data, such as genomic sequencing results, are larger still. Sequencing just one person’s genes requires as much as 200 gigabytes of storage.

The large size of unstructured health care data presents a challenge because the larger your data, the more it costs to store it, especially if you keep it in the same location where it was generated instead of taking advantage of lower-cost storage options where feasible. Health care organizations may also be reluctant to back up the data given the high costs of storing backups. And they may be tempted to delete data earlier than they would like in a bid to save on storage costs.

Health care regulations present another challenge. Different types of data are subject to different rules and regulations, such as retention mandates that require providers to store data for a certain period. When you have a large volume of unstructured files to work with, keeping track of which regulations apply to which ones can become quite difficult.

The fact that health care data is often generated by complex, siloed systems makes efficient data management even harder. Within a single organization, there may be dozens of different software systems and platforms collecting health care data. This distributed environment makes it hard to track, secure and protect data in a centralized way.

The Cost of Poor Unstructured Data Management

Failing to address challenges like this doesn’t only mean that health care providers and researchers will struggle to work efficiently. It also has serious consequences from a business perspective.

For one, unstructured data that organizations fail to track effectively because it is scattered across too many locations can bloat storage budgets. Instead of consolidating the data into centralized storage repositories – such as “cold” data storage tiers in the cloud – where monthly storage fees are just fractions of a penny per gigabyte, enterprises may be left paying many times that amount due to their inability to easily move data to a more cost-effective storage solution.

Inadequate data management can also lead to compliance violations. Sensitive data may end up residing in locations without the access controls mandated by frameworks like HIPAA, for example, because the business simply didn’t know it was storing sensitive information in the wrong place. Given that a single HIPAA violation can cost as much as $68,000, the financial impact of noncompliance gets onerous quickly.

On top of this, inefficient management of unstructured data slows down innovation and time to value. Being able to access and process data quickly is important for initiatives like AI-powered analytics. Health care organizations that can’t centrally track or manage their data will struggle to move faster than competitors.

Putting Unstructured Health Care Data to Better Use

The key to avoiding these risks is to implement a comprehensive strategy for managing unstructured health care data. Your strategy should allow you to:

  • Locate all data assets, no matter which system produced them or where they reside.
  • Tag and label unstructured data so that you know where it originated, its purpose and which compliance or other special requirements relate. Tagging can also accelerate the process of locating data that is relevant for AI training for a particular use case.
  • Implement effective access controls, backup routines and other data security and protection measures based on each asset’s requirements.
  • Identify data that could be moved to a different storage location or tier to reduce costs.

When you can do these things, you turn unstructured health care data from a cost and compliance liability into an asset for long term value and AI. In turn, you improve outcomes for patients, medical research and business stakeholders and inform the creation of new value-added products and services that patients need.