Thursday, September 6, 2018

Machine Learning & AI: A Personal Perspective, 2018








Machine Learning & AI: A Personal Perspective, 2018

 




“You asked the impossible of a machine and the machine complied.”


1.    Introduction

It’s Summer 2018 – You can barely pick up a newspaper or a magazine without finding an article about how machine learning (ML) or artificial intelligence (AI) is either going to substantially change business, science, healthcare &c. or how it already has[3]. You can also find, with little difficulty, many articles that document a “difference of opinion” among businessmen, computer scientists, government officials & a whole series of other random people as to whether ML-AI will over time destroy civilization as we know it, a la Skynet in the Terminator, or whether it will enhance our lives beyond our current ability to predict[4]. Quite the spectrum of outcomes. The truth is we won’t know where this will fall until sometime in the future. For now, it’s just a debate. There are, however, points of interest that are relevant & that I’ll try to highlight in this piece. I am orienting it towards healthcare for two reasons: 1) healthcare information technology (HIT) is where I am currently spending a good deal of my effort, & 2) ML-AI have (IMHO) the possibility of transforming healthcare & HIT, again in ways that are hard to predict & with the same difference of opinion expressed as in the more general debate... Of course, in healthcare this usually means either improvement in outcomes for patients, the ability to decrease costs &/or the improvement in the experience of healthcare for both patients & the people who provide it. The use of new technologies in healthcare has real consequences for both patients & providers as well as potentially changing the entire system – for better or worse.

The first thing we’ll need is an appreciation of what we mean when we say machine learning & artificial intelligence – that is what people not in the field understand these technologies to be, & a little bit of what the people developing the technologies think they are. For this section I have used my own experience as well as the Stanford University “100 years of AI” study[5],[6].

OK, so AI is the more general category, with ML being a subcategory of AI, albeit currently an important one.

AI has always had a complicated definition – this definition has divided AI researchers & structured the type of work they do. Merriam-Webster[7] defines AI as:

1. a branch of computer science dealing with the simulation of intelligent behavior in computers. 2. the capability of a machine to imitate intelligent human behavior.”

Notice that there is nothing in this definition about how this simulation is to be achieved. The Stanford “100 Year Study” defines AI as:

Artificial Intelligence (AI) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason, and take action. “

The simulation here is intended operate in the way that people “use their nervous systems”, even if the mechanisms of operation are quite different.

John McCarthy[8]who coined the term at the 1956 Dartmouth Conference that he organized defined AI as:

“the science & engineering of making intelligent machines”

People in the field have almost always differentiated (as in the dictionary definition) between 1) the simulation of intelligent behavior in machines, & 2) the imitation of human behavior. Many current AI researchers & developers believe that these are related, i.e. the imitation of human behavior (& human problem solving & information organization) capabilities will lead to the simulation of intelligent behavior in machines.

2.    Examples of Expert & Knowledge-Based Systems

AI has been characterized by many approaches since 1956. Several of the main ones have been expert or rule-based systems & knowledge representation systems. More recently machine learning (neural-net based) systems have been the focus. Following is a (very) quick history.

MYCIN[9] was one of several medical AI systems developed at Stanford in the early to mid-1970s. It was written in Lisp & used a base of about 600 rules to perform diagnosis & suggest therapies for bacterial infections. It used the backward chaining inference engine [10] in Lisp to evaluate symptoms. It was never used in actual practice, but testing indicated that it performed better than internists at Stanford University Medical Center.

The next 20 years or so saw the development of many so-called expert systems. These were reasoning systems that operated like MYCIN in that they combined an inference engine of various types (backward-chaining, forward-chaining, nondeterministic, etc.) with a set of information coded as if-then rules. They were deductive in nature (operating mainly by first-order predicate calculus) & limited by hardware & software capabilities to do this type of reasoning in “reasonable” amounts of time. These systems were written both in special purpose languages such as Lisp & Prolog as well as general, Turing-complete languages such as C.

One such system was AM, the Automated Mathematician developed by Doug Lenat at Stanford University. AM generated short Lisp phrases that were interpreted as mathematical concepts. It had a Lisp-based reasoning engine consisting of rules primarily about arithmetic. Lenat claimed that AM had independently rediscovered the Goldbach Conjecture[11] as well as several other fundamental theorems of arithmetic. Many computer scientists at the time thought that Lenat over-interpreted the success of AM (see below).

In any case, Lenat next wrote a system called Eurisko. It was intended to serve as a general discovery & learning system (where AM only functioned in the realm of arithmetic). It was architected differently than AM & written in a knowledge representation language called RLL-1 (itself written in Lisp). AM, & many other systems, had shown the importance of making representations of knowledge, i.e. facts & relationships, available to their inference engines. Such knowledge provided context for rule application. Eurisko was tested on several types of problems in several areas, but its biggest success was in the U.S. Traveler Trillion Credit Squadron Tournament, a civilian wargame competition held in southern California. The competition had an extensive set of rules of engagement about how virtual fleets of ships would battle each other. Competitors designed a fleet & then were paired against another team. A battle fought according to the ROE was then simulated & a winner determined. In 1981, Lenat entered the rules for that year’s competition into Eurisko & the system designed an innovative & atypical fleet consisting of a very large number of small, heavily armed vessels that were immobile. The Eurisko fleet won the competition, even though all of the other fleets were conventional in nature having large & small vessels & specific offensive & defensive tactics. The Eurisko fleet allowed competitors to expend their ammunition & sink many of its vessels, but because there were so many of them, they eventually were able to sink all of the enemy’s fleet. Lenat also competed & won, under a different set of rules, in 1982. After this, the organizers banned Eurisko from the competition. The system was not so successful in most of its other tests & was generally considered to be an interesting but mostly unsuccessful experiment. Lenat wrote a very interesting paper in which he opined that each system was more interesting than given credit for & outlined directions for future research[12]. Lenat is currently the CEO of Cycorp, an AI research & services company that is developing the Cyc Knowledge Base. This is, perhaps the ultimate expression of the idea that human-like reasoning (strong AI) requires a repository of structured knowledge. The Cyc KB consists of 500,000 terms, 17,000 types of relations & some 7,000,000 assertions relating these terms. These are associated into contexts or “micro-theories” which structure & guide reasoning in the system. Cyc KB is one endpoint of the knowledge-based reasoning approach to machine intelligence.

There are two other examples of this type of AI that I’ll give before switching to explore more contemporary machine learning systems. My very strong belief is that the lessons we learned in designing, developing, deploying & using these systems are relevant for the same functions in machine learning systems – more on that later.

R1 was a “production system”, that is a rule-based system based on If-Then rule execution, developed in the late 1970s by John McDermott (& others) at Carnegie-Mellon University. Its goal was to evaluate customer orders for Digital Equipment Corporation VAX 11/780 computer systems, determine that all necessary components were on the order, add missing components & produce a set of diagrams showing the three-dimensional relationships of all components. These diagrams were to be used by technicians installing the systems. The system was written in OPS-4, a language specialized for production-type expert systems.

By the early 1980s, the system, renamed XCON, had been brought in-house to Digital Equipment Corporation & several groups had been established to both improve & maintain the system & to do additional research on artificial intelligence[13]. XCON was in general use & proved to be quite successful, except that as new hardware configurations were added to the inventory, more & more productions (rules) had to be added. By the time I was associated with AI at DEC, the system had grown to well past 10,000 rules. Execution of OPS-4 (& later OPS-5) was nondeterministic so that any different execution of the system, even with identical input (customer order) might have a very different path through the rule base, that is the order that the rules fired in might be quite different &/or different rules might be used.

At one point in the mid-1980s, DEC hired John McDermott & research into the control of production systems needed to be done so that consistent results could be guaranteed.  This included partitioning of the rules to make rule guidance more efficient. XCON was by any measure a great success, but it required a large, specialized staff to run & maintain. To be fair most enterprise-level systems have the same characteristic.

Finally, one of the advanced development projects[14] that I lead during this time was aimed at producing a commercially reliable knowledge-based system that searched for & identified analogies in a set of knowledge & then reasoned about those analogies. This project was called KNOVAX – “the only VAX that knows what it’s doing”. The motivation was my opinion that much of the reasoning that we do as humans is based on analogies (similarity-difference reasoning) & that a system that identified analogies in a set of knowledge or information would be quite interesting & potentially productive in certain pragmatic situations[15]. As we had learned from rule-based systems, providing both knowledge, usually domain-specific knowledge, & context to an inference engine greatly improved its execution & predictive ability. In KB systems, knowledge was represented in several ways. In KNOVAX it took the form of frames. Frames were program constructs that organized knowledge about an object & provided it as values in “slots”. Frames were composed of slots that had identical organization. Slots could be named, so that their values were identified with a concept or construct. Slots could also contain relations (such as IS_A) or process attachments (programs). The system had a set of rules (inference engine) for identifying similar frames, comparing them in detail & proposing a set of similarity relations among frames. It also had a module that produced a human-readable (& hopefully human-understandable) report of why it created the similarity relations. The following figure is a schematic of this type of system.

The KNOVAX system scanned a frame-based KB, determined similarities among objects & formed groups of similar objects. It also provided explanations for why it related objects. One interesting feature of the system was that in testing it occasionally formed similarity groups that were not immediately understandable by human reviewers. In almost all such cases however, after reading the explanation, the reviewer understood the similarity & “learned” from the system.

The KNOVAX system was never shipped as a commercial product, but the Boeing Commercial Airplane Company entered a substantial amount of product development knowledge for the 777 program & used it to look for unexpected relations & anomalies in the development cycle (BA777 first flight 6/12/1994).

3.    Lessons Learned from Expert & Knowledge-Based Systems

Of course, any “lessons learned” are mainly the lessons I learned related to the larger context of increasing knowledge about intelligent systems. It’s worth noting that the informal “motto” of the Knowledge Systems Laboratory (KSL) at Stanford University was “knowledge is power”, but that at the 10th anniversary of its founding the assembled luminaries thought that the motto “knowledge is knowledge” better represented the state of their knowledge after 10 years & the lessons learned during that time…

In any case here’s my list (in no particular order):
·      
o   Domain Knowledge & the representation of knowledge is key to the conception & performance of intelligent systems. This corresponds to the so-called knowledge principle: “If a program is to perform a complex task well, it must know a great deal about the world in which it operates. In the absence of knowledge, all you have left is search & reasoning, & that is not enough[16].” Doug Lenat was on the AI staff of MCC[17] at this time (1987), as was I (as a representative of Digital Equipment Corporation, one of the funders), & we spent many hours debating the importance of knowledge representation & the role of knowledge in machine reasoning. Doug went on to found CYCorp[18]& I went on to work on reasoning by analogy[19].

·       The type of reasoning matters – some types of reasoning are better suited to specific types of problems. Production (rule) systems perform a type of deduction (by substitution of concepts or facts). This reasoning is optimal for systems that are structured according to set-theoretic principles such as arithmetic. Some languages, such as LISP, are also optimal for reasoning about these structured systems, as shown by AM & Eurisko. Similarity & difference reasoning (analogy) is better suited to comparison & classification problems (providing enough information is available) as shown by KNOVAX. Constraint-based reasoning, reasoning based on relationships among variables (facts) is effective for problems that can be formulated as sets of requirements such as scheduling, sequencing or parsing problems. Description-based or ontological reasoning descriptions (ontologies) that describe individual entities in terms of concepts & roles. It is applied to a large number of classification problems. It overlaps substantially with other types of reasoning such as analogy-based methods.
·       Idiosyncratic &/or “nonsensical results must be explored as they are often insightful, just not in the way you might imagine.

4.    Big Data & Machine Learning

In 2003, I was a Technology Vice President at the EMC Corporation responsible for document management & collaboration software technology. I had been a VP at Documentum & a member of its CTO Group when it was purchased by EMC. I was present at a meeting in early 1993 with Merck & Co., one of Documentum’s premier customers, where they told us that their next FDA submission would have at least 1 million discrete elements (documents, lab & research results, reports, graphics & figures etc.). They believed that our system could store this amount of data (~500TB) but wondered if we could successfully search & locate specific data in that volume & diversity of material. So did we…

This seems like almost a modest amount of data today when some healthcare organizations have in the range of 45-50 PBs of patient data & at the other end several projects at NASA generate about 100TBs of data per day. The fact that we can talk about exabytes (10E18 bytes) & zettabytes (10E21 bytes)is actually scary given that the Library of Congress collection of printed material (not images, voice, etc.) contains about 10-15 terabytes (10E12 bytes) or 0.00001 exabytes.

About the time I was at Merck for EMC, people started working on technologies for dealing with this volume & variety of data. Sometime in 2002, Doug Cutting & Mike Cafarella were working on an Apache search project called Lucerne at the University of Washington. They developed a web indexer called Nutch that eventually was able to run on up to 4-5 nodes & was indexing hundreds of millions of web pages, but still was not operating at “web-scale”, even for 2003-2004 timeframe. Engineers at Google published several seminal papers around this time[20] on the Google File System & MapReduce, a programming model & implementation for processing very large data sets. Cutting & Cafarella decided to use this set of technologies as the basis for an improved indexer & rewrote their systems in Java (Google had implemented them in C++). Cutting then joined Yahoo & over time Hadoop, the system that evolved from the Nutch project became the basis for all search & transactional interaction for Yahoo. By 2011 it was running on 42,000 servers with hundreds of petabytes of storage. Yahoo spun out the distributed file system & MapReduce as open source projects under Apache, & many other companies, research groups & universities started developing tools, apps & applications forming the Hadoop ecosystem. Several companies developing the Hadoop ecosystem were also spun out, either directly or as engineers left Yahoo including Cloudera & Hortonworks.

Today, most ultra-large-scale projects, whether they are directly search based or analytic, are layered on some flavor of Hadoop (or some flavor of Hadoop-inspired software such as Apache Spark). The point, however, is not that Hadoop is the ultimate answer for search & analytic processing in general [21](hint... it's not). It is that we have moved from enterprise distributed environments that include relational databases to shared-nothing clusters with massively parallel file & analysis systems. Those systems may be Hadoop based or Spark [22] based or use Dremel fro stream processing or visualization tools for presentation & visual analysis. We are now in an era of of massively parallel storage & analysis architectures, & these architectures enable a type of processing not previously possible (except with insanely expensive supercomputers). Analytics at this level are a separate topic, & I'll cover them in a separate briefing, but see my blog [23].


So, what does this have to do with machine learning, well… not much until recently. Let’s turn back a few pages. Many types of systems could be said to be machine learning systems… KNOVAX could have been called a machine learning system as CYC could be now. Most KB systems use algorithms or models to explore a set of information or knowledge & develop models or make relationships on this basis. Much of this function can be reduced to some form of pattern matching. Rule-based or production systems use rules to facilitate knowledge structure, relationship building & reasoning. KNOVAX, for instance, had a knowledge structure (frames) & a set of rules for reasoning about frames. At its core was a set rules for comparing the information in separate frames & determining how similar (or dissimilar) it was (~1000 frames, 15,000 relations, 250 rules). In this way, it proposed “analogies” & was able to do limited reasoning about them. CYC operates quite similarly, but at a very different scale (500,000 terms, 17,000 relations, 7,000,000 assertions/rules). Machine learning systems operate, in general, by doing pattern matching on very large data sets (Petabytes of data, x1015 bytes). Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. It  …”evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs”[24].

It is not my intent to give a comprehensive description of machine learning here, there are many, many references that can do a better job than I can. Here are several of them:

A fairly comprehensive description from Wikipedia:

Another description from tech emergence:

A description of “deep learning”:

Another perspective from MOT Technology Review:

I would, however like to summarize some general information that will help in making meaning of the examples I’ll use in a minute. When we talk about machine learning today, we are typically talking about neural network systems. These are networks of computational nodes that consist of an activator, an activation function, optionally a threshold for activation & a computational function. Each node takes a (numerical) input, executes if it is activated, & propagates an output to the next set of nodes which function similarly[25]. Neural nets are designed to perform several types of functions:

  •       Classification – sorting entities into classes
  •       Clustering – sorting entities into affinity groups
  •       Regression – locating entities along a continuous functional gradient (criteria)
In addition, they perform several types of learning (list not inclusive):

  •       Supervised – trained on a specific, unambiguous set of data
  •       Unsupervised – trained by executing function with a large amount of data that is not organized in any way
  •       Reinforcement – learning is confirmed by human (or machine) agency
  •       Adversarial – learning is through competition with other networks.
This should be a good start. Next though, I’ll describe several recent accomplishments/issues in machine learning & discuss what they might tell us about this technology. In addition, I’ll also cover lessons learned (up until now) & similarities of these lessons with our earlier insights from expert & knowledge-based systems. The two topics I’ll discuss are adversarial networks & advanced game playing.

Researchers at Facebook AI Research (FAIR) recently developed a set of generative adversarial networks[26], that is networks that are not supervised (trained) other than by experience, & that are conditioned to interact with each other in the context of a zero-sum game (i.e. as competitors). The FAIR networks “negotiated with each other to optimize their possession of a set of objects according to object values that they were given. Two very interesting (IMHO) things happened. The networks negotiated in chat, but they were not restricted to standard, or human-understandable, English. They proceeded to invent their own optimized English language variant that is, at best, minimally human interpretable (see figure above).

The second interesting thing about these networks is that they independently developed a strategy where they negotiated in such a way as to give the impression that they valued a specific object highly, when in fact they placed a low value on it. They would later give up this “low-value” (to them) object in order to acquire the object they actually did value. This is a very sophisticated strategy to develop in an unsupervised process[27].

Google DeepMind (Alphabet) developed a program to play the board-game Go (known as WeiQi ( )  in Chinese) which is said to have been invented around 2300 B.C.E.). Go is substantially more difficult to play than western chess which computers play by brute force. Western chess has a median of ~40 moves per game with a very restricted number of options per move. Go or Weiqi is estimated to have approximately 400 moves per game (although some are much longer). Each move has about 100 possible options per position. The theoretical bound[28] on the number of moves per game on a 19x19 board is 1048. Such a game would last long past the heat death of the Sun.

In October of 2015, DeepMind’s program, AlphaGo, played Le Sedol considered the #4 player in the world. AlphaGo beat Lee Sedol 4-1. Sometime after the match, a researcher at DeepMind present a “fix” for the issue that caused AlphaGo to be confused in Game 4 of the competition (the game Lee Sedol won) & showed by simulation that the program would have beat its opponent in almost all possible replays of the match. In May of 2017, AlphaGo played a three-game match against Ke Jie then ranked as the #1 ranked player in the world at the Future of Go Summit. AlphaGo won all three games & then was retired.

AlphaGo uses a combination of deep learning & tree-search algorithms with multiple networks performing different functions. It was trained using a database of ~30M moves from historical human games & then set to play itself in during a period of reinforcement learning. AlphazGo ran on 48 distributed “tensor processing units” (Google proprietary). Several other versions of the system have been developed since 2016, most notably:

  •       AlphaGo Zero, October 2017[29] - AlphaGo Zero used no human game input for training but played itself using improved algorithms. It achieved super human play levels within three days & beat the version (AlphaGo) that beat Lee Sedol 100:0.
  •       AlphaZero, December 2017[30] – AlphaZero was a generalized version of the system. It used a single algorithm & achieved superhuman levels of play in Go, chess & shogi within 24 hours!
The extremely rapid development of these systems without training or human intervention (currently dubbed hyperlearning) has led some AI researchers to speculate about the real possibility of a general artificial intelligence.

   Preliminary Lessons (I have) Learned from Big Data & Machine Learning

  •           Big Data is different than AI but can do some similar things – Big Data is at its base statistical pattern matching in ultra-large data sets in order to perform functions such as classification, clustering & regression. In a sense, however, this type of analysis is not “statistical” at all. If a “point-of-care” recommendation system has millions of patient records over multiple years, this could be in the range of 10s of petabytes of data (starting to get big). If the system is processing data while a provider is entering patient data at the point-of-care, & it comes back & indicates that it has located 4,271 cases that match the current input, this is not a statistical statement. There is no sampling involved, the system has processed the entire universe of data & has found a specific number of cases. This, of course, leaves aside the implications of the fact that even millions of patient records over multiple years is not the “entire universe” of patient data. If, additionally, the system indicates the outcomes in all cases & the treatment used & then ranks the outcomes/treatment plans from most to least effective, again this is not a statistical result in the strict sense.  If the system then goes ahead & uses a statistical modeling technique to predict the number of identical cases expected over a future time period, that is a statistical result. There are both interpretive & epistemological implications of this. I’ll discuss both in my follow-up on Big Data analysis.

Big data operates either by applying models to characterize a very large data set, or by “discovering” empirical patterns in the data. The example above is just such a discovery operation. The doctor enters data relevant to her patient & the system finds records with the (close to) identical pattern. It is then possible to determine things such as most effective treatment options from the set of records matching the current patient. Please note that this is not predictive, as mentioned above

  •  Big Data is not Machine Learning -
a.     Machine learning is hard – The idea of it is relatively simple, but the design, development & deployment of ML systems based on neural networks & other modules is complex & requires substantial computing resources & a good deal of specialized knowledge about statistical modeling & learning theory. This can be ameliorated by using any of the cloud-based ML engines that are available such as
b.      (Google) TensorFlow – Currently available as open source software, TensorFlow is a data graph application where nodes are computational & edges are multidimensional data arrays (broadly tensors) that are computed on & communicated among nodes.
c.     Microsoft Azure – a cloud-based set of AI tools that includes data storage, ML tools, a “workbench & integration with MS SQL Server
d.     AWS (Amazon) – AWS offers a broad variety of ML & associated services including both its own modeling & analytic tools & the ability to external tools (like TensorFlow) in its cloud.
e.     SAS & many others

Even with these systems, you have to decide what type of model (network structure & weighting strategy) to use & how to train the network. There are at least 30+ different types of networks currently in use ranging from simple perceptrons to deep convolutional & adversarial networks[31]. Each type of network represents a specific type of model tied to an execution strategy. Selection & training of these models requires a good deal of expertise.

  • The details of ML are different, but many concepts & lessons learned are similar to earlier systems:
a.     ML networks appear to be more effective using pre &/or post-processing of data/results enhanced by various types of search (AlphaZero uses tree-based)
b.     ML networks appear to be more effective using pre &/or post-processing of data/results with rules or productions
c.     Representation matters – The way training data is structured can make training much more or less effective
d.     Hyperlearning is a game changer – Hyperlearning, such as AlphaZero learning chess well enough in four ours with no supervision or training set to beat the strongest chess programs, will change the way we think about & use machine learning.

          Some Final (not really) Thoughts

a.     Will AI mean the end of humanity? – It’s hard not to say something about this when such luminaries as (the late) Steven Hawking, Elon Musk, Bill Gates & others, including some prominent AI researchers, are very visibly of the opinion that AI in some form represents an existential threat to the human race. Most people of this opinion do not believe that a SkyNet[32]-like entity will actively wage war against humanity in order to eliminate us. No, it will be subtler than that… first will be the loss of jobs & the changes to social & cultural institutions that accompany this & other changes. Then the subtle (& some not so subtle) biases in our intelligent systems will continue to cause the further evolution of sociocultural & economic systems. Next will come the consequences of the social & cultural changes as people’s motivation & ambition change,… then the long decline…

I don’t believe this. I don’t believe that AI is inherently biased toward any specific set of outcomes, positive or negative, other than those that we initially program into it. AI, is after all, not some aggregated & integrated SkyNet-like entity, at least not yet. It’s a set (still a relatively small set) of programs & systems directed at various type analysis & problem-solving. It is not developed in some pristine & culturally neutral background. Like all technology, it is developed in a social & cultural context that is partly the context of technology & technology development (male-dominated, quasi-egalitarian, etc.) & partly the national & regional contexts of the location(s) where it is developed.


What I do believe is that the potential development of a threat, perhaps not an existential one, but a serious one, is both possible & feasible. We must therefore first be aware of this possibility & second actively work to develop this technology in such a way that we tend to minimize the threat. Is this easy? – No. The culture of technology makes it harder (a topic for another working paper). Do we fully understand what it would mean to develop AI in this way – No. Some years ago (1989), I had been invited to be on a panel (on the practice of futurism) at a conference. One of the other panelists was Syd Mead, the “visual futurist” who was responsible for the look of the 1982 movie Blade Runner. At that point in his career, he was working mostly for Japanese companies. He was not so much designing near-future products but envisioning what their medium to longer term products might look like & what the environment they would be used in would look like. He & I got into an argument after he stated that design & technology in general was socially neutral in a way that it had no direct consequences. I stated pretty strongly that design & all technology was not only socially & culturally situated, but also socially & culturally active in ways that technologists had to take into account. The moderator finally stopped us, but not before we had agitated the room. At that time, I estimated that ~75% of the people present (almost all deep techies) agreed with Syd Mead. I thought at the time (& still do) that this was aspirational. Technologists would like to think they have no social responsibilities, but they do.

b.     Ethical Development – There is a lot to this, but I’ll be as brief as possible (kinda)

i.      Simplicity & Understandability – This really is (for me) the core of everything. “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.[33] Simplicity, except in the most formal sense such as in model theory[34], is subjective. If something takes 47 closely-spaced pages to explain, you might suspect that it could be simplified.

Very closely related to simplicity is understandability. One motivation for emphasizing simplicity is to improve the understandability of a model or analysis. This is one of the biggest criticisms of AI & machine learning methods today – the fact that they are for the most part black boxes & the reasoning (in AI systems) &/or modeling & pattern recognition (in machine learning systems) is so complex or random appearing as to be not understandable to mere humans. This does beg the question, though, of if mere humans can’t understand the modeling or analytic process, how are they supposed to understand & believe the results produced by this process? Good question…

 ii.     Bias – There are two major dimensions to the problem of bias & the use of AI & machine learning. The first is data bias, & the second is algorithm bias. Both of these problems are related to the fact that the collection & use of data as well as the development & application of algorithms are ultimately human activities that are embedded in social, organizational & cultural contexts.

Data bias is probably pervasive & can greatly affect the operation & results of applying especially machine learning to real-world problems. As detailed previously in this essay, most machine learning systems still are “trained” with sets of test or training data. The selection of these training sets determines how the system initially responds to problem data that it is exposed to. Bias in training data usually takes the form of the data set only partially representing the universe of discourse of the problem. In healthcare, for instance, almost all large clinical data sets greatly underrepresent minorities. This influences the machine learning system such that the results it presents, diagnosis of specific syndromes for instance, are inaccurate with respect to the underrepresented group(s). The footnote provides links to two recent articles on this topic[35]. Of course, this kind of bias is nothing new in healthcare & has been an issue since long before machine learning became the next shiny object in clinical care[36]. This is an extremely important issue that is only just beginning to be addressed.

The other side of the coin – algorithm bias, is just as important. This is also inevitable unless very deep steps are taken to prevent it. Algorithms are developed in social, cultural & even organizational contexts which ensures that the biases inherent in these institutions are represented in the machine learning systems underlying logic[37] . This is quite difficult to detect &/or to correct. Knowing that it happens is an essential first step, but over time development processes will have to be adopted that help to ameliorate these biases. Independent review will need to be a core part of these processes.

iii. Don’t be Creepy – I recently attended a conference on the ethical use of “big data” in healthcare[38]. One of the keynote speakers was Farzad Mostashari, a former  National Coordinator for Health Information Technology at the Department of Health &  Human Services. Farzad is a favorite of mine – he can always be counted on to express  important issues in his own style. He was speaking about his guidelines for doing research with healthcare data. His primary admonition was  “Don’t be creepy”. The work you are doing should not make people’s skin crawl… The example he gave was that while he was at the ONC, a proposal was made to make people’s healthcare records available to them only if they passed a credit check! This is truly creepy, unnecessary & contrary to the whole spirit of providing care, especially  in the safety-net where I primarily work. Use of AI & machine learning in any segment should not creep people out (leaving aside the situation where they are “creeped out” because they don’t agree with the results. The sensibilities of the groups to whom the results refer must be taken into account in the design of AI & machine learning studies & the promulgation of their results.

       Finally, the end…

I have been working on the development of artificial intelligence, in one form or another, for about 40 years (seriously?). If you had asked me in 1988 or even in 1998 whether some of the most interesting & important advances in computer science & real-world problem solving would be coming from this area, I would have told you that the time was past for AI to have that kind of general influence – that there were areas where it would continue to be developed & deployed, but that it would not become a major force in everything from marketing strategy to chip development. I was wrong! I did not anticipate - & would not have anticipated – the importance & influence that machine learning would have. I have been working with a good number of ML start-ups & some well-established companies developing ML in the past five years or so & I’m struck by four things about this development.
  •     The depth & breadth of the development & the potential it has to improve our understanding of many, many fields of inquiry
  •     The similarity in the foundations & even in many of the methods of design & reasoning of current ML systems with earlier AI systems of various types (as detailed in this essay)
  •      The enthusiasm of the people working on ML. This very much reminds of the attitudes of people in the late 1970s & early 1980s when we really thought that a general AI could be developed & applied to a wide range of problem solving
  •     The amount of resistance & pushback that accompanies technology developments that challenge the status quo both intellectually & culturally in established fields (e.g. healthcare…)
All of this seems quite normal to me, if anything the pace of development seems to have slowed, although this may be a symptom of so much of this work being done in corporate contexts, so we do not have a total view of the progress that is made. In any case, progress is being made, quickly so that 5-8 years from now, much of the application of this technology will seem “postechnical” [39]– that is not visible as a separate technology, but simply part of how we do “stuff”, whether that stuff is shopping or clinical research.



[2] Disclaimer: This work is a “personal perspective”. The opinions are my own, but so are any errors of fact, which are primarily the result of my fallible memory. DJH
[3] See among many others this interview with Ray Kurzweil: https://www.cfr.org/event/future-artificial-intelligence-and-its-impact-society
[4] Again, one of very many, this from the World Economic Forum: https://www.weforum.org/agenda/2016/10/top-10-ethical-issues-in-artificial-intelligence/
[5] My dissertation work in model theory was very relevant to the foundations of AI in mathematics & epistemology. I started working specifically on AI & ML as a Research Fellow at Stanford University in the 1970’s. I continued this part of my work until the present day as a Visiting Scholar at Stanford (in 1987-88 while on leave as Chief Scientist for AI at the Digital Equipment Corporation & as a Lecturer/Research Scholar at MIT (1998-99 & 2004-present)
[6] Stone, P. et al. 2016. Artificial Intelligence and Life in 2030." One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel, Stanford University, Stanford, CA,  September 2016. Doc: http://ai100.stanford.edu/2016-report. Accessed:  September 6, 2016.
[8] 1927-2011, Computer Scientist, winner of the Turing Award, U.S. National Medal of Science & the Kyoto Prize, developer of the Lisp programming language & influential in the development of early AI systems. Taught at Dartmouth, MIT & Stanford.
[9] Shortliffe, E.H.; Buchanan, B.G. (1975). "A model of inexact reasoning in medicine". Mathematical Biosciences. 23 (3–4): 351–379. MR 381762. doi:10.1016/0025-5564(75)90047-4.
[10] https://en.wikipedia.org/wiki/Backward_chaining
[11] C. Goldbach wrote in a letter to L. Euler in June of 1742 that “every number greater than 2 is the sum of 3 primes”. This was problematic as Goldbach considered 1 a prime number (no longer taken as correct). Euler re-expressed the conjecture as “all positive, even integers can be expressed as the sum of 2 primes”. This “conjecture” has still not been proved.
[12] Lenat, D. B., and Brown, J. S. (August 1984). "Why AM and EURISKO appear to work." Artificial Intelligence 23(3):269—294.
[13] The author (DJH) was Chief Scientist for Artificial Intelligence at DEC from 1986-1989 & was responsible for research in expert & knowledge based systems.
[14] DEC had three categories for development projects: 1) product development expected 100% of projects to results in commercial products, 2) advanced development projects expected >50% of projects to result in commercial products & 3) research expected <50% of projects to result in commercial products.
[15] Hartzband, D.J. & L. Holly. 1988. The provision of induction in data-model systems: II. Symmetric comparison. IJAR. 2(1):5-25.
Hartzband, D.J. 1987. The provision of inductive problem solving and (some) analogic learning in model-based systems. Group for Artificial Intelligence and Learning (GRAIL), Knowledge Systems Laboratory. Stanford University. Stanford, CA, USA. 6/87.state of their knowledge after 10 years &
[16] D.B. Lenat & E.A. Feigenbaum. 1987. On the Thresholds of Knowledge. MCC Technical Report AI-126-87. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.4196&rep=rep1&type=pdf
[17] Microelectronics & Computer Technology Corporation. Founded in Austin, TX in 1982 & funded by a number of American computer technologies companies, MCC did R&D work on systems architecture, hardware design, environmentally friendly tech & AI. It was disbanded in 2000/
[18] http://www.cyc.com/, https://en.wikipedia.org/wiki/Cyc
[19] Hartzband, D.J. & L. Holly. 1988. The provision of induction in data-model systems: II. Symmetric comparison. IJAR. 2(1):5-25.
Hartzband, D.J. 1987a. The provision of inductive problem solving and (some) analogic learning in model-based systems. Group for Artificial Intelligence and Learning (GRAIL), Knowledge Systems Laboratory. Stanford University. Stanford, CA, USA. 6/87.
[20] Chemawat, S., H. Gobioff & S-T Leung. 2003. The Google File System. ACM 1-58113-757-5/03/0010.  & J. Dean & S. Chemawat. 2004. MapReduce: Simplified Processing on Large Clusters. 6th Symposium on Operating Systems Design & Implementation. 2004. 137-149. San Francisco, CA.
[22] http://www.computerworld.com/article/2856063/enterprise-software/hadoop-successor-sparks-a-data-analysis-evolution.html
[24] https://en.wikipedia.org/wiki/Machine_learning
[25] https://en.wikipedia.org/wiki/Artificial_neural_network
[26] https://en.wikipedia.org/wiki/Generative_adversarial_network
[28] https://senseis.xmp.net/?NumberOfPossibleGoGames
[29] Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis (28 January 2016). "Mastering the game of Go with deep neural networks and tree search". Nature. 529 (7587): 484–489. Bibcode:2016Natur.529..484S. doi:10.1038/nature16961. ISSN 0028-0836. PMID 26819042
[30] Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815
[31] https://en.wikipedia.org/wiki/Types_of_artificial_neural_networks/
[32] The Terminator – 1984 (!) movie directed by James Cameron, starring Arnold Schwarzenegger in which a national defense AI becomes “aware”, decides that humans are a threat to its existence & wages war to eliminate them
[33] A. Einstein, On the Method of Theoretical Physics, Herbert Spencer Lecture, Oxford University, June 10, 1931. Most probably the origin of the aphorism, also attributed to Einstein, “Everything should be as simple as possible, but no simpler.”
[34] c.f. Hartzband, D.J. 1972. Eine Logik für das Ableiten der minimalen grundlegenden Annalmen für mehrfahe Modelle. Dissertation. Universität Hamburg. DFR.
[38] Health And… Data Science and Public Action. NYU Langone School of Public Health. 5/21/2018
[39] It is not by chance that my consultancy is named PostTechnical Research – The idea is that a postechnical context is one where you do not notice your use of any specific technology… you just do what you want/need to do & technology transparently supports you. I intend no value judgement about this (that’s the topic for another white paper). It is just a fact that this will be the case for most people in the next 5-8 years – at least IMNSHO. Welcome to the “past-informed” future…

1 comment:

dandan said...

Thanks for this informative post!
ML