Friday, September 4, 2015

Turing Tests, Search & Current AI



[H]ow many different automata or moving machines can be made by the industry of man [...] For we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on.

Rene Descartes. 1637. Discourse on Method[1],[2]







I recently saw the film Ex Machina. Leaving aside what I thought about the story, the characters etc. (although there was a lot to like & dislike in both), there were two aspects of the movie that were very interesting (IMHO) & which bear further thought & comment. The first was an updated idea of a Turing Test, & the second was the idea that the infrastructure & function of a general search engine could serve as the basis for a functioning artificial intelligence. An examination of these topics will cover quite a bit of current work on various AI related topics.

First, the Turing Test[3]… In 1950, Alan Turing proposed an “imitation game” for determining a machine’s (computer’s) ability to exhibit intelligent behavior equivalent to, or at least indistinguishable from, a human being. There are many versions of this test, but the essence of it is that natural language is appropriate for this test & that a human interrogator will not be able to distinguish the conversation they are having (by text) as generated by a human or a machine. Turing originally proposed that if 30% of interrogators were not able to determine which other player was a human & which was a machine after five minutes of “conversation”, the machine would be said to have passed the test. There have been many competitions to determine if a machine could pass this test, & in fact the Loebner Prize competition has been held every year since 1991. No machine has yet won the full version of this prize.

In 2014, a competition held at the University of Reading (UK) was said to have been won by the Russian chatbot Evgenyi Goostman. This result was highly controversial (surprise, surprise) as the chatbot was engineered to represent a 13-year old Ukrainian boy who was not a native English speaker. Nevertheless, Evgenyi convinced 33% of the contest’s judges that it was human in a series of five-minute conversations.

Although also controversial, the vast majority of these contests were conducted under the model that the interrogator did not necessarily know that one of the players was a machine. In one of the few competitions to test this principle, the 2008 Loebner Prize (Reading University, UK) examined the interrogator’s ability to distinguish between machine & human even in machine/machine & human/human pairs. No significant difference was found among interrogators who knew that a machine was part of the test & those who did not. Perhaps more interesting was that interrogators used criteria such as spelling errors to distinguish humans (machines did not make such errors) & speed & length of response to distinguish machines.

What about the movie you say… Oh yes, the movie. The primary difference in the movie is that a human is tasked with performing a Turing Test on a humanoid robot that is clearly not human. The interrogator & robot meet face-to-face for the test, & even though the interrogator knows the subject is a robot, he winds up becoming emotionally attached with disastrous consequences. My point is not the moral & real world consequences of emotional attachments to robots, but that this type of “Turing Test” will be upon us sooner rather than later, & that we’ll have to have not only a technical but also a cultural context for dealing with intelligence exhibited by non-humans. Oh, & incidentally, not all of these non-humans will be attractive lifelike robots – most will be entities such as massive networks of devices, or intelligent programmatic agents… How will we deal with these entities? Entities that, in their ability to communicate & analyze information may not be distinguishable from human beings. Perhaps our only clues will be just that these entities will be much better at communicating & analyzing information. What kind of sociocultural adaptations will we have to make in order to function in this “brave new world”? How will people, humans, work & live alongside these entities.  That brings me to search & back to the movie.

As I have already said, one of the most interesting (to me) aspects of the movie was the conceit that a massive, general search engine, that was optimized for certain types of intelligence relevant to natural language, personalization, deep learning & search, could serve as the basis for an artificial intelligence that could pass the modified Turing Test described above. About ten years ago (1/2006), I wrote a paper titled: If Search is the Answer, What’s the Question[4]. In this paper I predicted that in the 2012 timeframe, peoples’ work process would be primarily knowledge & model based, & that “search” would provide the overall structure for this work process that would emphasize information curation & problem solving. I wrote a number of these predictive papers in that time period & I generally got the direction right & the timeframe wrong – I always estimated that change would happen faster than it actually did. That’s the case here too. Now in 2015, we are not yet at the point where models & problem solving are the primary work context for knowledge workers, but there are several trends taking us in this direction. In addition, there is the overhyped, but nevertheless immensely important trend of ultra-large data set analysis (big data) that is also restructuring the concept of work & work process. How has search changed since 2006 & where are we in this evolution? Of course, the question raised by the movie is “does search provide an adequate basis for general artificial intelligence”? Lot’s of stuff to address here…

A lot has happened in search since 2006, or has it. This is true technically, but from an end-user perspective the most visible occurrence has been the emergence of Google as the world’s preeminent search utility. A recent report from AYTM Market research found that 74.3% of worldwide consumers use Google as their primary search engine. SearchStatsBrain[5] reports that Google performed 2.1 trillion (that’s with a T) searches in 2014 – almost 6 billion a day! The number of searches not performed on Google, Bing or Yahoo is trivial. All three engines have deployed almost all new function, at the end-user level, over a relatively short period of time. New capabilities such as local search &/or vertical search are available on the “big three”, but also through specialized apps such as Yelp or De.Li.cious. Perhaps the most interesting, & relevant for this article, new feature is natural language search, as offered by Google (Google Voice), Bing (Smart Search, Microsoft) as well as by such personal assistants as Siri (Apple) & Cortana (Microsoft) & a growing number of others. Natural language provides the underlying interaction context for not only a modified Turing Test, but also for our long-term adoption & adaptation to intelligent systems.

What about the other dimensions I predicted search would have to proceed in to be able to serve as the primary structuring agent for people’s work that was based on model interpretation & problem-solving. In 2006 I said:[6]

“During this transition, search will have to evolve itself from the ubiquitous web & enterprise engines of today that still mainly operate on key word & page rank algorithms to much more deeply focused tools. These tools will be able to refine their operation by using coarse & fine-grained models, not just of business, but also of more general knowledge categories. They will initially be able to structure work process because of their interactions with such models & eventually, in the 2010-2012 timeframe, facilitate work organization & problem solving as well as location of general or specialized knowledge in the context of a person’s work (or personal) process. This will require the integration with search of such areas as classification, ontology-based & advanced metadata (currently RDF & geospatial but also evolving quickly) modeling, rule-based reasoning & non-deductive reasoning of various forms – this is just the beginning, but we are already seeing some of these advances in products such as Mooter, Clusty or Grokker or in the integration of rule-based reasoning with business process management &/or ontology-based modeling (Protégé, Swoop). This work is just at its beginning.”

Most of this has not happened. Search engines, at least the ones used by the vast majority of people, do not currently use models or rule bases or any other type of reasoning, to facilitate work organization & problem solving. Classification is provided by a very few engines that are hardly used (compared to the Big 3). Mooter & Grokker no longer exist except as Source Forge downloads last updated in 2007. Clusty is currently called Yippy (www.yippy.com). It appears to do the same thing that Clusty did in 2007 – provide a sidebar of topics that the search results can be sorted into. There are certainly specialized geospatial search engines (GEOSS, GSE etc.), & geospatial search has been integrated into apps such as Google Maps etc. The larger picture is that search has progressed a lot, but its primary direction has been the facilitation a different types of monetization, thus emphasizing the transactional nature of search engine use.

That’s not to say there hasn’t been some movement in this direction. Two areas do stand out: semantic search & big data. Semantic search has been in development for a long time & can be described as follows:

Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results.”[7],[8]

The big 3 search engines all use some aspects of semantic search. Apart from attempting to use language related techniques such as synonym mapping, semantic search engines use additional techniques including, but not limited to: RDF search & RDF path traversal, keyword to concept mapping, analysis of graph patterns to identify relationships, analysis of other (more complex) patterns, use of ontology-based inference (OWL), use of other nonstandard logics[9]. 

The following figure is a relationship matrix generated by the semantic search engine SenseBot for the query “semantic search”. It represents important references by text size & provides hyperlinks for each topic. Other semantic search engines


BARBARA STARR     BING     CONFERENCE     CONTEXT     DISAMBIGUATION     ERIN EVERHART     GOOGLE     INTENT     KEYWORDS     KNOWLEDGE GRAPH     LISTS     MARKETING     MARKUP     MEANING     MICROSOFT     NAVIGATION     ONTOLOGIES     QUERIES     QUERY     SEARCH ENGINE     SEARCH ENGINES     SEMANTIC SEARCH     SEMANTICS     SEO     SOCIAL MEDIA     SOCIALPRO     STANDARDS     STRUCTURED DATA     TECHNOLOGY     UNDERSTANDING    
     

    Relationship Matrix (Sensebot) for Semantic Search, accessed 15 July 2015


represent results differently, some as sidebar topics, some as graphics, but all represent relationships or semantic groupings in some way. The user can choose what semantic dimension to explore. In the above example, the user could investigate companies, data topics or even languages. Search is optimized in the sense that the user gets to emphasize what aspect of the information they were looking for is explored. Each of the mainstream search companies have semantic search projects, & each introduces some aspect of semantic search in updates to their product. I’ll discuss other AI directions next.

Big data – I already hear you rolling your eyes… I’ve been doing a project based on “big data” analysis with healthcare safety-net organizations, & I have a much better appreciation of the strengths & weaknesses of this approach now that I’ve been using various aspects of it for the past year & a half. Why, you are saying, are we talking about it here – this is supposed to be about search. Yes – correct… the underpinnings of current big data analysis was initially all about search. Sometime in 2002, Doug Cutting & Mike Cafarella were working on an Apache search project called Lucerne at the University of Washington. They developed a web indexer called Nutch that eventually was able to run on up to 4-5 nodes & was indexing hundreds of millions of web pages, but still was not operating at “web-scale”, even for 2003-2004 timeframe. Engineers at Google published several seminal papers around this time[10],[11] on the Google File System & MapReduce, a programming model & implementation for processing very large data sets. Cutting & Cafarella decided to use this set of technologies as the basis for an improved indexer & rewrote their systems in Java (Google had implemented them in C++). Cutting then joined Yahoo & over time Hadoop, the system that evolved from the Nutch project was the basis for all search & transactional interaction for Yahoo. By 2011 it was running on 42,000 servers with hundreds of petabytes of storage. Yahoo spun out the distributed file system & MapReduce as open source projects under Apache, & many other companies, research groups & universities started developing tools, apps & applications forming the Hadoop ecosystem. Several companies developing the Hadoop ecosystem were also spun out, either directly or as engineers left Yahoo including Cloudera & Hortonworks.

OK – so back to today when most ultra-large scale projects, whether they are directly search based or analytic, are layered on some flavor of Hadoop (or some flavor of Hadoop-inspired software such as IBM Spark). The point, however, is not that Hadoop is the ultimate answer for search or for analytic processing in general[12] (it’s not…). It is that we have moved from enterprise distributed environments that include relational databases to shared-nothing clusters with massively parallel file & analysis systems. Those systems may be Hadoop based, or Spark[13] based or use Dremel[14] for stream processing or visualization tools for presentation & visual analysis. We are now in an era of massively parallel storage & analysis architectures & these architectures enable a type of processing not previously possible.

So what else is going on that is driving this larger vision of search. Like big data, deep learning has also run up the hype cycle to the extent that mainstream media has already informed the non-technogeek public about the wonders of the technology. The NY Times, the paper of record, ran stories as early as June 2012 & The Economist as early as February 2014[15]. The thing that people know about deep learning is that Watson, a system designed & built by IBM, beat a set of former champions at Jeopardy in February of 2011. While Watson is not purely a deep learning system (it uses a broad variety of techniques), it was the first example of this technology broadly visible outside of the AI community. More recently, companies including Yahoo, Facebook & particularly Google have continued the development of this learning technology. So what is it? & why is it important for search & many other areas going forward?




ABILITY     ALGORITHMS     APPLICATIONS     ARTIFICIAL INTELLIGENCE              BRAIN                             DEEP LEARNING     FACEBOOK              GOOGLE     HINTON     IMAGE RECOGNITION     IMAGES     INDUSTRY                    LANGUAGE                        LEARNING                                          MACHINE LEARNING         MACHINE LEARNING CLOSER               MACHINES                   MICROSOFT     MIT TECHNOLOGY REVIEW     NEURAL NETWORKS          ORIGINAL GOALS                           PATTERNS     RECOGNITION     REPRESENTATION        SOUND     SPEECH                        TECHNIQUES     TECHNOLOGY    
 


      Relationship Matrix (Sensebot) for Deep Learning, accessed 20 July 2015

There is no agreed upon, single definition of deep learning, but most people working on its development would agree that it is a type of machine learning characterized by:[16] 1) the use of multiple layers of nonlinear processing units (usually a neural net), 2) the supervised (through examples) or unsupervised learning of feature sets (patterns) in each layer with the layers organized in a hierarchy from low to high level features where the output of one layer serves as the input for the next higher layer. OK – that didn’t mean a whole lot… Actually what this technology is about is the recognition of patterns or features as information is fed through a neural net, with each layer of the net developing a more & more detailed description of the feature until an interpretation of the information can be made. A good example would be automated facial recognition. The most abstract layers of the net recognize that the overall pattern is a face, then subsequent layers identify & resolve additional “features” or “patterns” such as the mouth, the nose etc. & eventually the entire face is resolved. At this point there is a detailed digital record of the patterns so that this face could potentially be identified from a repository of faces & additional faces could be resolved, as the abstract patterns that make up a face have been defined.

There are specialized architectures for deep learning networks & many different mathematical models & associated algorithms for training & learning. To date, much of the focus in this area has been on image recognition (classification) & speech processing. An example from image recognition is the Google Brain project in which a deep learning network learned to recognize human faces & cats from entirely unlabeled data[17].

What does this mean for search? Imagine a search that instead of keying on specific words or terms instead was able to determine abstract patterns in a request & then respond by specializing that abstraction in a particular context & return information (in whatever form) relevant to the overall pattern or context, instead of just what it was able to match syntactically. Some time ago, I was the lead for a project that built a system[18] that used much more naïve pattern recognition to return information that it determined was “analogous” to a description given by the requester. While this system was not very powerful compared to today’s learning systems, it did often surprise the requestor with an analogy that it had identified. The system provided an explanation of why it had suggested the analogy & often the requestor would be puzzled at first but then would agree with the system once it saw the explanation. Wouldn’t it be nice to be surprised like this by a search engine? Deep learning will provide that capability & much more.

I’ve talked about semantic search, big data & machine learning with reference to where search is going. It’s currently mid-2015 – I’m only three years late with respect to the context for search I predicted back in 2008. We are currently at search on a cusp – a cusp that could easily push it over to the broad-scale service based on patterns, models & analogies that assists in structuring our work (inquiry) & facilitates problem solving in ways we might not have developed or thought of. The ability of search to use deep learning capabilities means that pattern recognition of all sorts will lead to models that allow hypothesis testing (as already done by IBM Watson) & the facilitation of problem solving in context. This latter will require frameworks (ontologies, rule sets & trained networks) that will allow problems to be represented & reasoned about. An additional approach will be the recognition of patterns & models in ultra-large scale data sets & subsequent data characterization or reasoning about the empirical data. Search using these mechanisms will be different than it is today, I’d say it will be better than it is today; better at finding results that match not only the content of our requests, but also match & potentially expand their context. At that point, 3-5 years from now, will it be capable of supporting an independent artificial intelligence. My very strong feeling is that unless such programs are allowed to evolve on their own, this will not happen, but then again, I guess we’ll just have to ask it.

Up next:
·      Intelligent search, big data, deep learning in healthcare information technology (& healthcare in general)
·      Design as a model for the evolution of work… What will future knowledge workers be like?
·      & further in the future… a “meditation” on the evolution of information technology using the “cluster of terms” model[19]






[1] Image originally of the Orion Nebula (NGC 1976), (Hubble) Space Telescope Science Institute, postprocessed by deepdreamr.com (16 July 2015
[2] Rene Descartes. 1637. Discourse on the Method of Rightly Conducting One's Reason and of Seeking Truth in the Sciences. Leiden, Netherlands.
[3] Turing, Alan (October 1950), "Computing Machinery and Intelligence", Mind LIX (236): 433–460, doi:10.1093/mind/LIX.236.433, ISSN 0026-4423, retrieved 2015-07-05
[4]  With respect to Danny Bobrow for his 1985 paper If Prolog is the Answer, What’s the Question?
IEEE Trans. Softw. Eng. 11(11). – perhaps the most insightful paper on the logic of AI languages
ever published (with the possible exception of Doug Lenat’s paper on why AM worked…), presented to MIT ESD Seminar Series, 3/2006.
[5] http://www.statisticbrain.com/Google-searches/
[6] I also started a piece of work, initially in 1994, looking at design practice as a model for knowledge work. Much of my thinking on how work would evolve came from this study.  c.f. Knowledge Work as Design: A Description of a Post-Convergence Work Paradigm. PostTechnical Research Strategist. May 2002.
[7] https://en.wikipedia.org/wiki/Semantic_search
[8] John, Tony (March 15, 2012). "What is Semantic Search?". Techulator. Retrieved 9 July 2015
[9] Mäkelä, Eetu. "Survey of Semantic Search Research" (PDF). Retrieved 9 July 2015.
[10] Chemawat, S., H. Gobioff & S-T Leung. 2003. The Google File System. ACM 1-58113-757-5/03/0010.
[11] J. Dean & S. Chemawat. 2004. MapReduce: Simplified Processing on Large Clusters. 6th Symposium on Operating Systems Design & Implementation. 2004. 137-149. San Francisco, CA.
[13] http://www.computerworld.com/article/2856063/enterprise-software/hadoop-successor-sparks-a-data-analysis-evolution.html
[15] NY Times: 21 May 2015, 12 June 2012; The Economist: 1 February 2014, 13 May 2015
[16] summarized from: https://en.wikipedia.org/wiki/Deep_learning, accessed 15 July 2015
[17] Ng, Andrew, et al. "Building High-level Features Using Large Scale Unsupervised Learning”. 29th Internaltion Conference on Machine Learning. Edinburgh, Scotland. 2012
[18] GNOsys, Digital Equipment Corporation, also see:
Hartzband, D.J. 1987. The provision of inductive problem solving and (some) analogic learning in model-based systems. Group for Artificial Intelligence and Learning (GRAIL), Knowledge Systems Laboratory. Stanford University. Stanford, CA, USA. 6/87.
Hartzband, D.J. 1987. A discussion of inference and problem solving in the GNOsys knowledge model. Problem Solving Systems Group. Artificial Intelligence Technology Group. Digital Equipment Corporation. 5/87.
Hartzband, D.J., L. Holly, and F.J. Maryanski. 1987. The provision of induction in data-model systems: I. Analogy. International Journal of Approximate Reasoning (IJAR) 1(1):1-17.
[19] Foster, Hal. 2015. Bad New Days: Art, Criticism, Emergency. Verso. NYC. 208 pp.

Thursday, September 3, 2015

New Focus, Same Me...












 Newcomb Hollow Beach, Cape Cod National Seashore, Wellfleet, MA, 9 August 2015, 11:03, © David Hartzband


I have not posted to this blog in about nine months. I’ve been busy working on a large analytics project in the healthcare safety net (Federally Qualified Community Health Centers) & writing on technology topics other than in healthcare.

The next posts will be a mixture of thoughts & predictions mainly in the area still called artificial intelligence. I’ve a good deal of work in this area in the (far distant) past, but my focus has always been on developing systems to enhance the reasoning & problem solving abilities of people. This is different than the focus of many of my colleagues who have attempted to develop systems that reproduce some aspect of human capability, up to & including developing systems that are indistinguishable from people, not just in the linguistic sense of a Turing test, an “imitation game”, but in a holistic sense. I have no opinion about the integrity & usefulness of this work, other than that I have chosen to emphasize other aspects of machine abilities in my own work, but the recent spate of very visible & knowledgeable scientists & technologists who have commented on the direction of current AI work[1] gives some idea of how concerned people are.

My next few posts will focus on this general topic with an emphasis on the evolution of search, the implications of ultra-large data set analytics (Big Data) & deep learning for the development of AI as well as the implications of these technologies for healthcare information technology & healthcare in general. My next planned posts should be (remember, planning is for guidance):
·      Turing Tests, Search & the development of Artificial Intelligence – A Meditation on the Movie Ex Machina
·      The Concept of Data as an Asset & Its Relationship to Healthcare Decision Making
·      Design Process & the Evolution of Knowledge Work – What Will Future Knowledge Worker Really be Like? & What Kinds of Systems Will They Use?
·      & further out… General Thoughts on the Evolution of Information Technology based on a “Cluster of Terms” Model[2]

Monday, December 8, 2014

Healthcare Analytics: Concepts & Assumptions



Data is a precious thing and will last longer than the systems themselves.” – Tim Berners-Lee, inventor of the World Wide Web.







In the past several years, we’ve heard an immense amount about data, big data, data analytics & every possible topic related to data. We know that 90% of all currently available data has been generated in the past two (2) years![1] We also know that every business publication has had articles on data (Business Week, May 2013; Harvard Business Review, December 2013; Forbes February 2014 to name just a few), & that every business consultant such as Accenture, Deloitte, Gartner etc. has a practice or advisory in this area.

Closer to home, many large healthcare organizations are developing analytic systems utilizing very large amounts of data to provide diagnostic, treatment planning & operational guidance. Examples would be the point-of-care recommendation systems currently used by Kaiser Permanente & the Mayo clinic, among others, that provide near-real-time diagnosis & treatment planning guidance to providers at a patient’s bedside. Dr. Watson (IBM) is another well-known example.[2] These systems use millions of patient records, often recorded over long periods of time, as well as thousands (or more) journal articles & physician’s notes to provide their analysis & recommendations. Not many healthcare organizations have this amount of patient data available, so what are the implications of analytics for most hospitals, clinics & practices, & how can they take advantage of analytics to make better clinical & operational decisions.

First, let’s define what we mean by data & analytics. Data, in this sense, is a set of qualitative or quantitative values. Simply restated, pieces of data are individual pieces of information[3]. They may be numeric (quantitative), or words or sets of words (qualitative) or even hybrids such as addresses (77 Massachusetts Avenue, E40-248). Analytics, in general, is the discovery & communication of meaningful patterns in data[4]. Contemporary analytics has taken on a more specific meaning, especially in contrast to statistical analysis of data (the application of statistical hypothesis testing methods to data). Analytics today are a set of methods for data organization & analysis that are applied when data have (some of the) the following characteristics:
  • Volume: management of multiple petabytes of data
  • Velocity: management of data values that are changing rapidly (e.g. NASA’s launch sensor net of >1M sensors of various types sampled 3x/second)
  • Variety: many different types of data in different formats & from different sources

In healthcare, data variety is most often the issue. This type of data is very difficult to organize & analyze in a conventional sense.

What are the differences between analytics & conventional analysis? They can be summarized as follows:
  • Contemporary analytics is the empirical characterization of data & information. An example would be: A physician at Kaiser is using their point-of- care recommendation in order to confirm a diagnosis & develop an optimal treatment plan. The physician is entering patient parameters while doing a bedside examination. The point-of-care recommendation system evaluates 4 PB of patient data against a set of patient parameters entered at the point-of-care for a specific patient, & it finds 9,372 cases similar enough to use for comparison with the patient. That is not a statistical prediction of similarity, but an exact empirical characterization. In the same sense, if that system classifies treatment plans of those 9,372 cases according to outcome, that is not a statistical prediction of outcome, but an exact characterization of the outcomes present in the data. This changes how we think about results in that we are looking at exact characterizations not predictions with associated probabilities. This is true of even smaller sets of data.
  • Contemporary analytics does not require extensive data transformation & normalization. Analytic systems such as Hadoop-based analytic stacks aggregate data in many different forms (alphanumeric, text, image, other media) & from many different sources (EHR, financial systems, practice management, public health systems, other private & public data sources), & perform analysis across all of these types (e.g. cost/service/location/provider or number of patient interactions vs. macro-demographic & population trends). It does require an understanding of the normalized definitions of common terms (encounters, providers etc.), especially if cross-organizational comparisons are to be made.
  • In general hypotheses & informational relationships are informed by the analysis, not by a priori assumptions. This means that empirical characterization is carried out by performing inquiry developed by consensus of the healthcare organization’s staff (or designees, all parts of the organization should be represented) aligned with strategy. Then hypotheses are formed (& relationships defined) based on empirical results & analysis may continue.

OK – so we know something about data & analytics, but what does this actually mean for healthcare organizations. As a technologist, I have to say that as interesting as the technology of analytics is, it’s not the point. The point is a way of thinking about data & analysis. I use the phrase “data as an asset” as shorthand for this way of thinking. Thinking of data as an asset means that you (& your team) look at data in a larger context than just the clinical &/or operational data that you have. You think about data in relation to the strategy of your organization & in relation to the kinds of strategic decisions that are required to keep your organization healthy. Thinking of data just as facts is no longer enough to create the largest amount of value from that data, you must think of data strategically. This means having an awareness of data, your own as well as external data… data from city, county, state & federal programs… data from other organizations… as much relevant data as you can discover & access.

Once you start thinking about data as an asset, there are some things you can do to utilize data strategically.
  1. First is to review (or develop) your organization’s strategy & identify what decisions are embedded in it. 
  2.  Next is to identify what data you have access to that is relevant to those decisions. This may, in fact, not be entirely straightforward. You may include data that is not immediately apparent as relevant. Remember, one of the characteristics of analytics is that the relationships in the data are defined empirically by inquiry, not a priori.
  3. Third is to convene groups of heterogeneous groups of stakeholders to develop areas of inquiry to be address by analysis. These can be quite general (e.g. the relationship of the provision of specific enabling services to outcome or cost), but they must be related to the organization’s strategy & to the decisions that need to be made to carry out that strategy.
  4. Fourth, detailed analytic queries are developed to address the areas of inquiry & carried out. 
  5. Finally, results are interpreted & presented in support of data-driven decision-making. Queries can also be redesigned, modified or enhanced at this point & rerun.

Recent conversations with CIOs & other healthcare executives at conferences & other meetings have focused on several areas of inquiry that are strategic to the continued growth & success of these organizations. These areas have included:
  • Classifying patients according to risk & cost: This requires defining a set of classes (such as healthy patients, patients with chronic conditions, patients with multiple chronic conditions, patients with chronic conditions & behavioral health issues, etc.) & then analyzing the patient population with respect to these classes. Additionally it often does additional analysis to determine the cost of care for each patient & each class. This allows the top 1%, 5% & bottom 5% etc. of patients to be identified with respect to cost & may lead to interventions once causes & similarities in these classes are also analyzed.
  • Determining the cost of providing specific clinical & non-clinical services (where data is available): This can be done along various axes such as per location, per time period, per provider; all of which may provide insight into costs & with additional analysis into the relationship of services to outcomes.
  • Analyzing population trends utilizing both internal clinical & demographic data as well as publicly available data (such as State provided population trend data per location, time period etc.): This can provide insight into encounter trends as well as revenue trends.

Many other areas of inquiry are possible, but need to be aligned with the organization’s strategy in order to be productive & to enable data-driven decision-making.

As I mentioned above, the technology of contemporary analytics is also interesting, & it will be covered in my next post.

[Please Note: A version of this post appears as my column for Technology in Focus on the RCHN Community Health Foundation website (www.rchnfoundation.org)]


[1] http://www.sciencedaily.com/releases/2013/05/130522085217.htm
[2] http://www-03.ibm.com/innovation/ca/en/watson/watson_in_healthcare.shtml
[3] http://en.wikipedia.org/wiki/Data
[4] http://en.wikipedia.org/wiki/Analytics