Wednesday, January 15, 2014

Search, Analytics & Healthcare

I’ve written several posts now about the importance of analytics in healthcare & how healthcare organizations, especially those with constrained resources, can move towards the use of analytics without a large amount of expense (except in effort). I’ve also written a little about what the analysis of available &/or large-scale data sets might consist of & how it can be used to provide leverage for such organizations. Now I’d like to write about the evolution of analytics & how search has also evolved in parallel, & what the implications of this might be for healthcare.

Some time ago, I wrote an essay titled “If Search is the Answer”[1]. In it, I proposed that search was not only an important functional capability in our current & near-future work lives, but that it actually was the principle around which our work was organized. Now it appears that our use of constantly connected devices is resulting in our work lives & our lives increasingly merging & that search has become an important, if not the important, organizing principle in general for us. Search is much more than typing some keywords into Google or Bing, etc. It really spans a range of capabilities that includes not only naïve searches, but also semantic searches of all kinds. The endpoint of the search range, at this time, is analytic query, that is, the posing of questions that require quantitative or semantic analysis, or both of a body of information. This body of information has grown so that we might be talking about gigabytes (109 bytes) to petabytes (1015 bytes) of things such as healthcare records, financial models, academic publications etc.

Let’s look at two different examples of search evolution – the first is Facebook Graph Search. Facebook has always provided search for people based on names, profiles etc. Graph search is different in two ways: first, it utilizes a semantic engine that allows natural language queries & evaluates these queries to be able to use both the exact meanings & interpretations of the meanings of the words used, & second, it uses the structure of the semantic graph built by the underlying Facebook engine so that it understands not only the content of user profiles, but the relationships of that profile with other user’s profiles. It returns results from both within Facebook & from the web, based on results from Bing (Microsoft) & now also from Russian search engine Yandex (http://www.yandex.com/). Of course, it only has a semantic graph (today) from Facebook content. Sample queries could be such requests as “find the pictures of all of my friends who visited San Francisco this year” or “find people who liked the movie Fruitvale Station & live in Oakland”. Semantic search is not new; the concepts were first developed by Alan Collins & M. Ross Quillian (both then at BBN Technologies) & enhanced by many people mainly working in advanced database query. What’s different about Facebook graph search is the reach that it has; Facebook has 1.2B monthly users.

The second example is IBM Watson. Watson is a cognitive system that is a good deal more than what was exhibited on Jeopardy. Watson is a reasoning system that performs not only semantic analysis of natural language, but also hypothesis generation for answering questions, evaluation of potential responses & synthesis of a “best” response. It uses large amounts of information & is designed to be able to evaluate petabyte level information sources in order to generate hypotheses & potential solutions.  It ranks these solutions for presentation, & it remembers the hypotheses it previously generated & how successful they were for specific queries. It uses this information to optimize how it answers similar queries, thereby “learning” from experience. One relevant example query might be “Find all the patients with similar medical profiles & diagnoses & rank the success of the treatment they received from most to least successful”.

OK – so what about healthcare? Search will continue to evolve toward more & more connected search; that is search organized in some way such as relationships in a social network or relationships in a collection medical records etc. Whether that connection is defined by parameterized graphs (as in Facebook Graph Search) or by semantic query interpretation with hypothesis generation & experienced-based learning (as in IBM Watson), near-future search provide a way of using our own concepts & needs to organize & generate knowledge from large bodies of information. Healthcare analytics can be thought of as a kind of search. I have recently been involved with a project that sought to determine the cost per medical encounter classified by service category (medical, dental, behavioral, enabling, ancillary, etc.) at a number of Community Health Centers. This analysis could be expressed as an analytic query; in fact most analyses could be expressed as analytic queries & could be posed to systems such as Watson, ParAccel Analytics Platform or any of the Hadoop-based analytic packages. The accuracy & validity of the answers would depend on a number of factors including (at least): the quantity of the information available, the quality of the information available, the ability to express the query appropriately in the system, the ability of the system to interpret the query appropriately & the ability of the system to present the results in an understandable way. If we specialize the query we specified earlier to “find all the patients with the diagnosis of non-Hodgkin’s lymphoma expressed in the skull, characterize their symptoms for similarity & rank the success of their treatment from most to least successful”, we’ll understand that the results might be different if we had 750,000 patient records (a Health Center Controlled Network) to analyze than if we had 9,000,000 patient records (Kaiser Southern California). What if we could analyze even larger numbers of records? How good could our results be? Let’s remember that quantity does always result in quality & the results that we get are only as good as the questions we ask. For specific clinical queries, though, we can get very good results, good enough that we can find treatments that would not be obvious or even identifiable by other means except serendipitously. Good enough, also, that we can determine that we’re asking the wrong questions. This type of diagnosis & treatment planning is in the future (the relatively near-future) for most clinicians, but somewhat less ambitious queries can be done today in administrative & financial as well as clinical areas.

The evolution of search in terms of the types of systems that can be queried is leading to an evolution in how we use administrative, financial & clinical information in healthcare. As search is increasingly organized around concepts that reflect relationships in the real world, it will become possible to ask questions that provide answers some of the most complex issues we face such as improving clinical diagnosis & treatment. In parallel, as the tools we use for search become more powerful, but with easier to use “query interfaces”, asking these questions, & productively applying the results will become easier & easier. Search, & the attendant concept of discovery, is increasingly becoming the organizing principle for much of our work in healthcare.




[1] The title is a homage to Danny Bobrow’s 1985 paper If Prolog is the Answer, What’s the Question” IEEE Trans Softw. Eng. 11(11) – perhaps the most insightful paper on the logic of AI languages ever published, with the possible exception of Doug Lenat’s paper on why AM worked (Lenat, D.B. & J.S. Brown. 1984. Why AM and Eurisko appear to work. AI 23(3):269-294.) My essay at http://posttechnical.com/?page_id=58

No comments: