Monday, December 8, 2014

Healthcare Analytics: Concepts & Assumptions



Data is a precious thing and will last longer than the systems themselves.” – Tim Berners-Lee, inventor of the World Wide Web.







In the past several years, we’ve heard an immense amount about data, big data, data analytics & every possible topic related to data. We know that 90% of all currently available data has been generated in the past two (2) years![1] We also know that every business publication has had articles on data (Business Week, May 2013; Harvard Business Review, December 2013; Forbes February 2014 to name just a few), & that every business consultant such as Accenture, Deloitte, Gartner etc. has a practice or advisory in this area.

Closer to home, many large healthcare organizations are developing analytic systems utilizing very large amounts of data to provide diagnostic, treatment planning & operational guidance. Examples would be the point-of-care recommendation systems currently used by Kaiser Permanente & the Mayo clinic, among others, that provide near-real-time diagnosis & treatment planning guidance to providers at a patient’s bedside. Dr. Watson (IBM) is another well-known example.[2] These systems use millions of patient records, often recorded over long periods of time, as well as thousands (or more) journal articles & physician’s notes to provide their analysis & recommendations. Not many healthcare organizations have this amount of patient data available, so what are the implications of analytics for most hospitals, clinics & practices, & how can they take advantage of analytics to make better clinical & operational decisions.

First, let’s define what we mean by data & analytics. Data, in this sense, is a set of qualitative or quantitative values. Simply restated, pieces of data are individual pieces of information[3]. They may be numeric (quantitative), or words or sets of words (qualitative) or even hybrids such as addresses (77 Massachusetts Avenue, E40-248). Analytics, in general, is the discovery & communication of meaningful patterns in data[4]. Contemporary analytics has taken on a more specific meaning, especially in contrast to statistical analysis of data (the application of statistical hypothesis testing methods to data). Analytics today are a set of methods for data organization & analysis that are applied when data have (some of the) the following characteristics:
  • Volume: management of multiple petabytes of data
  • Velocity: management of data values that are changing rapidly (e.g. NASA’s launch sensor net of >1M sensors of various types sampled 3x/second)
  • Variety: many different types of data in different formats & from different sources

In healthcare, data variety is most often the issue. This type of data is very difficult to organize & analyze in a conventional sense.

What are the differences between analytics & conventional analysis? They can be summarized as follows:
  • Contemporary analytics is the empirical characterization of data & information. An example would be: A physician at Kaiser is using their point-of- care recommendation in order to confirm a diagnosis & develop an optimal treatment plan. The physician is entering patient parameters while doing a bedside examination. The point-of-care recommendation system evaluates 4 PB of patient data against a set of patient parameters entered at the point-of-care for a specific patient, & it finds 9,372 cases similar enough to use for comparison with the patient. That is not a statistical prediction of similarity, but an exact empirical characterization. In the same sense, if that system classifies treatment plans of those 9,372 cases according to outcome, that is not a statistical prediction of outcome, but an exact characterization of the outcomes present in the data. This changes how we think about results in that we are looking at exact characterizations not predictions with associated probabilities. This is true of even smaller sets of data.
  • Contemporary analytics does not require extensive data transformation & normalization. Analytic systems such as Hadoop-based analytic stacks aggregate data in many different forms (alphanumeric, text, image, other media) & from many different sources (EHR, financial systems, practice management, public health systems, other private & public data sources), & perform analysis across all of these types (e.g. cost/service/location/provider or number of patient interactions vs. macro-demographic & population trends). It does require an understanding of the normalized definitions of common terms (encounters, providers etc.), especially if cross-organizational comparisons are to be made.
  • In general hypotheses & informational relationships are informed by the analysis, not by a priori assumptions. This means that empirical characterization is carried out by performing inquiry developed by consensus of the healthcare organization’s staff (or designees, all parts of the organization should be represented) aligned with strategy. Then hypotheses are formed (& relationships defined) based on empirical results & analysis may continue.

OK – so we know something about data & analytics, but what does this actually mean for healthcare organizations. As a technologist, I have to say that as interesting as the technology of analytics is, it’s not the point. The point is a way of thinking about data & analysis. I use the phrase “data as an asset” as shorthand for this way of thinking. Thinking of data as an asset means that you (& your team) look at data in a larger context than just the clinical &/or operational data that you have. You think about data in relation to the strategy of your organization & in relation to the kinds of strategic decisions that are required to keep your organization healthy. Thinking of data just as facts is no longer enough to create the largest amount of value from that data, you must think of data strategically. This means having an awareness of data, your own as well as external data… data from city, county, state & federal programs… data from other organizations… as much relevant data as you can discover & access.

Once you start thinking about data as an asset, there are some things you can do to utilize data strategically.
  1. First is to review (or develop) your organization’s strategy & identify what decisions are embedded in it. 
  2.  Next is to identify what data you have access to that is relevant to those decisions. This may, in fact, not be entirely straightforward. You may include data that is not immediately apparent as relevant. Remember, one of the characteristics of analytics is that the relationships in the data are defined empirically by inquiry, not a priori.
  3. Third is to convene groups of heterogeneous groups of stakeholders to develop areas of inquiry to be address by analysis. These can be quite general (e.g. the relationship of the provision of specific enabling services to outcome or cost), but they must be related to the organization’s strategy & to the decisions that need to be made to carry out that strategy.
  4. Fourth, detailed analytic queries are developed to address the areas of inquiry & carried out. 
  5. Finally, results are interpreted & presented in support of data-driven decision-making. Queries can also be redesigned, modified or enhanced at this point & rerun.

Recent conversations with CIOs & other healthcare executives at conferences & other meetings have focused on several areas of inquiry that are strategic to the continued growth & success of these organizations. These areas have included:
  • Classifying patients according to risk & cost: This requires defining a set of classes (such as healthy patients, patients with chronic conditions, patients with multiple chronic conditions, patients with chronic conditions & behavioral health issues, etc.) & then analyzing the patient population with respect to these classes. Additionally it often does additional analysis to determine the cost of care for each patient & each class. This allows the top 1%, 5% & bottom 5% etc. of patients to be identified with respect to cost & may lead to interventions once causes & similarities in these classes are also analyzed.
  • Determining the cost of providing specific clinical & non-clinical services (where data is available): This can be done along various axes such as per location, per time period, per provider; all of which may provide insight into costs & with additional analysis into the relationship of services to outcomes.
  • Analyzing population trends utilizing both internal clinical & demographic data as well as publicly available data (such as State provided population trend data per location, time period etc.): This can provide insight into encounter trends as well as revenue trends.

Many other areas of inquiry are possible, but need to be aligned with the organization’s strategy in order to be productive & to enable data-driven decision-making.

As I mentioned above, the technology of contemporary analytics is also interesting, & it will be covered in my next post.

[Please Note: A version of this post appears as my column for Technology in Focus on the RCHN Community Health Foundation website (www.rchnfoundation.org)]


[1] http://www.sciencedaily.com/releases/2013/05/130522085217.htm
[2] http://www-03.ibm.com/innovation/ca/en/watson/watson_in_healthcare.shtml
[3] http://en.wikipedia.org/wiki/Data
[4] http://en.wikipedia.org/wiki/Analytics

Friday, July 18, 2014

The Learning Healthcare System

The ONC has proposed a ten-year vision[1] for interoperability in healthcare information technology that divides this time into three periods. Years 1-3 are devoted to achieving technical interoperability & sharing of healthcare information; while years 4-6 focus on using this shared information to improve quality & lower cost. Years 7-10 are labeled the “learning health system” & described as “Individuals, care providers, public health (officials) and researchers contribute information and learn from information shared across the health IT ecosystem, with rapid advancement in methods for deriving meaning from data without sharing PHI.[2]” What “learn” might mean in this context is an interesting question… First, let’s look at what the ONC appears to mean to by it, & then we’ll look more broadly.

The ONC lists a number of characteristics of the healthcare system during this timeframe (2021-2024) in no particular order:
  1.        Enhanced healthcare information contribution & sharing across clinical (provider & patient), public health & research areas
  2.      More functional technical tools available to apply to this data- search & visualization are examples
  3.      General availability of “patient-centered” outcomes research results
  4.      Continuous learning through predictive & retrospective analysis of aggregated data
  5.      Availability of patient-specific clinical decision support taking into account the patient’s genetic profile, clinical history, local public health trends & relevant socio-cultural trends (social determinants)
  6.      Improved public health surveillance integrated with point-of-care decision support

This is actually quite a good list, but much of it is either already available or will be available in the near future (18-24 months). Let’s review where (I think) we are with this list, & then let’s explore some of the possibilities for a learning healthcare system in the ONC’s 10-year timeframe. The issue is, as William Gibson famously observed, “The future is already here, it’s just not evenly distributed”[3]
  •       Point 1 - Enhanced information contribution & sharing across different healthcare contexts, is what the ONC’s years 1-3 are about. A high level of data interoperability will allow contribution of information from a variety of sources, including patients, for a variety of purposes. Interoperability is a frustrating issue today, as many vendors can’t effectively share data across their own product lines. Hopefully this will change in the next three years. We have achieved high levels of data interoperability in other industries, & notwithstanding that many people working in healthcare believe its data to be substantially more complex & sensitive to error than data in, say… banking or aerospace design, I think that with a pragmatic approach, not just to standards & certification, but also to vendor architecture, API development & in the field data sharing, that we can achieve appropriate levels of interoperability in this timeframe.

  •         Point 2 - We have many advanced functional tools available today; we’re just not using them. This is often because HIT vendors are loath to integrate their systems (practice management, EHR, lab reporting etc.) with external tools, but prefer to develop tools themselves. This doesn’t always work for several reasons: the vendor may not have the necessary skill &/or resources to develop such tools, the vendor’s business model may not include such development, the vendor may have allocated this development to partners who have their own agenda & business model(s) & many other factors. There is another much larger issue; most of these HIT systems are architected on an enterprise model that is not as scalable or flexible as contemporary designs. As HIT products migrate to contemporary infrastructure (Hadoop, NoSQL etc.), interoperability & integration will become possible at larger & larger scale.

  •        Point 3 – The American Health Information Management Association provides a good introduction to the variety of healthcare research data already available[4]. In addition the Agency for Healthcare Research & Quality (AHRQ, HHS) & the Healthcare Information Management Systems Society (HIMSS) both make a good deal of data available. The Centers for Medicare & Medicaid has also recently made a substantial amount of claims & provider payment data available[5]. This trend will continue, especially as large healthcare organizations begin making public the results of analyses of ultra-large data sets (see immediately below).

  •         Points 4-5 – These points are linked, especially at the point-of-care. Continuous learning, in this context, is the ability to develop new knowledge & strategies for using that knowledge based on an understanding of current & previous results & information. Many systems currently perform retrospective (& in some cases predictive) analysis of large amounts of healthcare data to determine patterns in both clinical & operational areas for healthcare organizations (Point 4). When this type of analysis is done based on specific patient characteristics at the point-of-care, diagnosis & treatment planning can be based on the empirical data & learning is brought forward with each analysis (Point 5). Examples include:

o   Mayo Clinic - AWARE “bedside consulting” system (5M patient records over 15 years)
o   Beth Israel Deaconess Medical Center (Boston) – Clinical Query system (2.2M patient records)
o   Kaiser Permanente – Natural language query system (9.1M patient records over 10 years)
o   Partners Healthcare (MA) – Queriable Inference Patient Dossier
o   IBM/Wellpoint – “Dr. Watson”, deep understanding system applied to healthcare information (cancer diagnosis)
  •         Point 6 – Systems today used analysis of regional to hyperlocal trends in disease patterns to characterize the public health context of specific locations. These analytic results can be combined with point-of-care recommendation systems to improve diagnosis & treatment. An example would be Google Flu Trends although there are many apps such as Healthify[6] that provide hyperlocal services recommendations based on EHR encounter information.

To summarize: current & near future HIT systems can provide appropriate levels of data interoperability, new architectures & tools are already making HIT & the analysis of HIT data much more scalable & performant, large amounts of research data is already available, even to patient & consumers, ultra-large scale pattern matching in healthcare data sets can provide the basis for both continuous learning by systems & their human users, this learning is already being applied to point-of-care recommendation systems that draw from millions of patient records & finally current & near future HIT systems are reporting large amounts of public health data which is being analyzed to provide better understanding of large scale health phenomenon & eventually integrated with point-of-care recommendation systems.

OK – so what isn’t being done? & What could be done? One major thing is that the more information you include in these analyses, the better the results are, so a broader range of inputs should be included. Such information streams as public social media, data on social determinants, even online & conventional shopping data can be important in understanding a person’s health profile. A recent story in Bloomberg Business Week[7] described the use of credit card purchasing data to supplement providers information about patient behavior – are you actually picking up your prescriptions, buying a lot of junk food, shopping at Big & Tall etc. Marketers use this kind of data routinely in other industries, so why not in healthcare[8]? There are almost an infinite number of information sources that could be used productively, once the sociocultural issues are understood & ameliorated.

There are also new kinds of analysis that are being developed. An example would be work at Oxford University[9] where an algorithm analyzes ordinary photographs & can predict genetic anomalies & diseases. Hundreds or more of such new uses of information are being developed & will be available (& more evenly distributed) in the near future.

But what about learning, I hear you say… A recent issue of Health Affairs was devoted to the theme of “big data”. One of the articles reviewed work on a learning health system, talked about impediments & made some predictions[10]. This work used the following definition of a rapid learning healthcare system,  “a health system that learns as quickly as possible about the best treatment for each patient—and delivers it. This kind of system draws on a much faster knowledge production process: from discovery science, to new therapies and clinical science that can inform personalized medical care, to better-informed physicians and patients.” This idea of a rapid learning health system was first proposed in 2007[11], & the Rapid Learning Project & others have done a good deal of work, mostly workshops & policy papers. As we have seen, however, this vision of deep analytics applied at the point-of-care to diagnosis & treatment of individual patients is already in place in a number of settings. This is a lot, but a learning healthcare system has to be more than this.

As already stated, learning can be thought of as “the ability to develop new knowledge & strategies for using that knowledge based on an understanding of current & previous results & information”. This ability is continuous & ongoing, the implication for a healthcare system is that whenever an actor (provider, patient, caregiver &c.) is using a part of the system, the system is moderating the user’s context (usage) & anticipates what information & analysis may be relevant. The system may then give the user the opportunity to request this information; which can be diagnosis, treatment suggestions, data on treatment, analysis of alternatives, public health implications, information & recommendations on amelioration of social determinants & many other possibilities. In order to do this, the system would have to have access to a great many data sources as well as have deep understanding, hypothesis testing & recommendation capability (in the Dr. Watson mode) & an interface that allowed substantial interaction with the user in a manner that was non-threatening & productive. In addition, the system would serve as an information source & liaison for public health & social systems as well as healthcare systems at other organizations (that the user might be associated with). It might communicate with the user through a variety of devices & in a context (app or portal) that they were used to. We’re obviously not there yet.

Can we get there? I believe that we can, but we have to focus. The we, here, is not only the producers of software & systems, but providers, patients, caregivers & healthcare organizations (if corporations can be people, so can healthcare organizations)[12].  Here is my (partial) list of what’s important:
  •         Facilitate real interoperability for healthcare systems – The development & adoption of standards does not automatically convey interoperability[13]. A lot of really hard work has to be done to ensure that even things like standard documents (like C-CDA) can be assimilated by multiple system & that the data, once imported, makes sense. This could easily take more than the three years the ONC has allowed.
  •         Develop learning in the healthcare context – Learning is not just analyzing ultra-large information “lakes” to do pattern matching & make diagnosis & treatment recommendations. It is creating new knowledge & new strategies for developing & using knowledge. In this sense, it is more like IBM’s Watson, that attempts a semantic understanding of material & then forms & tests hypotheses to answer questions about that material than it is like most of the point-of-care recommendation systems currently in use or under development. These systems do some form of pattern matching to an initial set of data about a patient, & if their information source is large enough, may discern patterns that can be translated into recommendations with a very high “probability” of relevance (if not correctness, based on the analyzed data).

An aside is relevant here. The current point-of-care systems we are talking about are not conventional rule-based systems. They do not have domain-specific heuristics about cancer diagnosis & therapy (as an example). The heuristics that they have are about semantic normalization, general pattern matching, visualization etc. They operate by taking input on a patient’s condition & comparing that to (potentially) millions of patient records to determine what the most effective diagnoses & treatment plans have been for those specific inputs. Earlier “expert” systems operated quite differently by taking the input on patient condition & executing a set (sometimes as large a set as 10s-of-thousands) of domain-specific rules. These systems often had relatively high percentages of effectiveness – Mycin,[14] an expert system (with approximately 600 rules) that made recommendations for treatment of bacterial infections, developed at Stanford University in the 1970s, had an effectiveness of 69% which was higher than that of medical experts surveyed. Current point-of care systems have an effectiveness of (close to) 100% relative to their information base. This is a quite different kind of effectiveness than that of a rule-based system (& discussion of the causes of this difference are beyond the scope of this current blog).

Learning in healthcare systems won’t come about by itself. It will have to be facilitated by government-public-private partnerships & specifically funded. Real prototypes & production systems will have to be subsidized & deployed for testing & feedback. A project similar to that which produced the NwHIN (originally NHIN) needs to be planned & quickly started so that working groups can begin describing the functionality of healthcare learning & companies can be selected to begin prototyping. Standards will not be as important initially in this effort (as they were in NHIN development) as innovation will be more important. The companies should not just include the usual suspects (IBM, Google, Microsoft, etc.) although they are important, but should also include some smaller organizations with different ideas that may (or may not) be layered on the infrastructure provided by their larger brethren.

A learning healthcare system is a great goal, but it won’t happen without a lot of support (funding) & leadership. Let’s start now.



[1] Connecting Health and Care for the Nation: A 10-Year Vision to Achieve an Interoperable Health IT Infrastructure. ONC. June 2014. http://healthit.gov/sites/default/files/ONC10yearInteroperabilityConceptPaper.pdf, accessed 25 June 2014.
[2] ONC. 2014. P.8
[3] William Gibson, interview in The Economist, 4 December 2003.
[4]http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_050345.hcsp?dDocName=bok1_050345
[5] http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/
[6] https://www.healthify.us/en
[7] http://www.businessweek.com/articles/2014-07-03/hospitals-are-mining-patients-credit-card-data-to-predict-who-will-get-sick?utm_content=buffer7874a&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
[8] Privacy issues are the first reason to think twice about it, but we have already ceded our privacy when Amazon or Google makes purchasing suggestions for us.
[9] Ferry, Q. et al. 2014. Diagnostically relevant facial gestalt information from ordinary photos - See more at: http://elifesciences.org/content/3/e02020#sthash.c8S7Tm7d.dpuf
[10] Etherege, L.M. 2014. Rapid Learning:A Breakthough Agenda. Health Affairs. vol. 33 no. 7 1155-1162. July 2014.
[11] Etheredge LM.A rapid-learning health system. Health Aff (Millwood). 2007;26(2):w107–18. DOI: 10.1377/hlthaff.26.2.w107.
[12] The doctrine of corporations as people has been established in the U.S. as early as 1819 (Dartmouth College vs. Woodward, 17 U.S. 518 (1819) & as recently as Burwell vs. Hobby Lobby (573 U.S. ___ 2014)
[13] As I stated in my last post, during the development of interoperability standards for CORBA, a rep from one of the other vendors (I was representing the Digital Equipment Corporation) told me he would be compliant if my system sent his system a message & his system sent my system back an error message!
[14] http://en.wikipedia.org/wiki/Mycin