Gaining Better Insights From Unstructured Data in Health Care

An opinion piece by an advisory board member of Persistent Systems, an IT services and consulting firm in India

Werner Boeing

October 20, 2022

5 Min Read

There is almost universal agreement that the current health care system needs transforming. Health care costs are rising, the population is getting older, people’s needs and expectations are changing, and it is becoming increasingly difficult to employ qualified health care staff.

When speaking to stakeholders about this necessary transformation, common elements arise. For example, the concept of P4 Medicine (Predictive, Preventive, Personalized and Participatory) is a helpful framework to articulate a better health care system.

Artificial intelligence and machine learning are key to this transformation, and data at both the population and individual levels are needed to prevent illness through early detection of diseases and create individualized treatments that elevate the patient experience.

Providers need to implement AI/ML technologies to automate the management of this data, and conduct analysis via NLP (Natural Language Processing), paving the way for predictive analytics to identify conditions and prescription of personalized treatments.

However, when it comes to collecting, curating and collaborating on the necessary health care-datasets, many issues come up, including

  • Data privacy and patient consent management

  • Bias and explainability of health care AI/ML solutions

  • Data ecosystems and the need to operationalize federated learning for AI/ML solutions

While these topics are important, a fundamental issue that needs to be resolved first is the actual provisioning of quality health care data.

Starting point

The health care sector vitally needs to establish a way to automate the management of unstructured data – to ensure the data available is of the best quality.

Data within radiology images, text files, and physician's notes – all factors that speak to a person’s health – are not easily organized using pre-defined tabular structures within institutions, let alone using shared health care ontologies across various providers.

This unstructured data makes up roughly 80% of health care information. Entries are very difficult to clean and curate, which means providers cannot afford to work with large corpuses of disorganized records.

When seeking to solve the unstructured data problem, there is a temptation to clean data only when it is needed for a specific use case. However, this can lead to an increase in the number of entries that are incorrectly formatted, duplicated or incomplete, causing further complications without automated data management processes

Changing the approach to unstructured data

Health care providers have yet to leverage the potential of information within Electronic Health Records (EHRs) and similar documents. For example, data within pathology, radiology, and patient self-reporting can help better understand individual patient journeys, patient populations, and to assess clinical risks.

Manually sifting through increasing numbers of documents is overly labor-intensive, ultimately leading to significant costs and reduced productivity. Moreover, manual extraction of data tends to yield smaller data sets, which do not scale to the demands of large patient populations.

Unstructured data from these sources requires successful data cleansing in compliance with regulatory standards. This involves parsing to detect errors and identify key information within documents, correction of errata, standardizing records so terminology is used consistently, and consolidating raw data for further use.

Some organizations might opt to clean data based on perceptions of necessity but storing significant volumes of unstructured data means providers will be overwhelmed if they do not continually manage this information.

A proper data culture is essential and starts with assisting frontline staff. When moving away from pure manual data cleansing, AI/ML solutions will assist in streamlining the cleaning process, freeing up medical staff to spend more time on patient care. Treatments will be improved through data and greater personalization, with both staff and patients benefiting from the results.

Managing unstructured data

Advancements in NLP have enabled providers to access insights not immediately available from medical documents. NLP algorithms can be specifically trained to recognize the specifics of medical terminology and complicated health issues via deep learning techniques.

These algorithms can discern trends from relevant statements within the text of documents, allowing a range of use cases. For example, they can be used in operational hospital decision support systems to optimize patient flow for better resource utilization – such as average length of stay − and patient experience, like waiting time.

Unstructured data can be processed and automatically codified by analytics platforms and clinical decision support systems (CDSS). Once converted, health systems will be able to classify patients and even provide a summary of their condition upon arrival to appointments, or upon hospitalization.

Health care providers can even move toward proactively addressing issues through prediction analysis. Combining multiple data feeds including EHRs can be used to identify patients at increased risk of developing adverse conditions.

In addition, updating EHRs can be made easier with speech recognition technology used to dictate notes, making it possible to transcribe treatment information in real time and update patient records much more quickly — helping to reduce the burden on health care professionals.

When properly harnessed, AI/ML applications can create new and much improved services − effective and compliant provisioning of tele-health services − with NLP extracting data and ultimately letting health care professionals provide the best standards of patient care.

These transformative capabilities are not only relevant at the health care provider level but specifically for the pharma development process, from drug discovery to clinical trial and product launch.

Access to meaningful, real world evidence at scale becomes a competitive advantage, for instance enabling better drug target identification, reducing the cost of clinical trial execution, or implementing new outcome-oriented business models.

Indeed, the health care landscape is shifting dramatically given the applications of AI/ML technologies. Unstructured data was once largely unmanageable, but now automated data management and NLP algorithms are finally letting practitioners tap more of its potential.

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.