Machine Learning & AI: A Personal Perspective, 2018
“You asked the
impossible of a machine and the machine complied.”
1. Introduction
It’s
Summer 2018 – You can barely pick up a newspaper or a magazine without finding
an article about how machine learning (ML) or artificial intelligence (AI) is
either going to substantially change business, science, healthcare &c. or
how it already has[3].
You can also find, with little difficulty, many articles that document a
“difference of opinion” among businessmen, computer scientists, government
officials & a whole series of other random people as to whether ML-AI will
over time destroy civilization as we know it, a la Skynet in the Terminator, or whether it will enhance our lives
beyond our current ability to predict[4].
Quite the spectrum of outcomes. The truth is we won’t know where this will fall
until sometime in the future. For now, it’s just a debate. There are, however,
points of interest that are relevant & that I’ll try to highlight in this
piece. I am orienting it towards healthcare for two reasons: 1) healthcare
information technology (HIT) is where I am currently spending a good deal of my
effort, & 2) ML-AI have (IMHO) the possibility of transforming healthcare
& HIT, again in ways that are hard to predict & with the same
difference of opinion expressed as in the more general debate... Of course, in
healthcare this usually means either improvement in outcomes for patients, the
ability to decrease costs &/or the improvement in the experience of
healthcare for both patients & the people who provide it. The use of new
technologies in healthcare has real consequences for both patients &
providers as well as potentially changing the entire system – for better or
worse.
The
first thing we’ll need is an appreciation of what we mean when we say machine
learning & artificial intelligence – that is what people not in the field
understand these technologies to be, & a little bit of what the people
developing the technologies think they are. For this section I have used my own
experience as well as the Stanford University “100 years of AI” study[5],[6].
OK,
so AI is the more general category, with ML being a subcategory of AI, albeit
currently an important one.
AI
has always had a complicated definition – this definition has divided AI
researchers & structured the type of work they do. Merriam-Webster[7]
defines AI as:
“1. a branch of computer science dealing with the simulation
of intelligent behavior in computers. 2. the capability of a machine to imitate
intelligent human behavior.”
Notice that there is nothing in this definition about how
this simulation is to be achieved. The Stanford “100 Year Study” defines AI as:
” Artificial Intelligence (AI) is a science and a set of
computational technologies that are inspired by—but typically operate quite
differently from—the ways people use their nervous systems and bodies to sense,
learn, reason, and take action. “
The
simulation here is intended operate in the way that people “use their nervous
systems”, even if the mechanisms of operation are quite different.
John
McCarthy[8]who
coined the term at the 1956 Dartmouth Conference that he organized defined AI
as:
“the science & engineering of making intelligent
machines”
People in
the field have almost always differentiated (as in the dictionary definition)
between 1) the simulation of intelligent behavior in machines, & 2) the
imitation of human behavior. Many current AI researchers & developers
believe that these are related, i.e.
the imitation of human behavior (& human problem solving & information
organization) capabilities will lead to the simulation of intelligent behavior
in machines.
2.
Examples of Expert &
Knowledge-Based Systems
AI has been
characterized by many approaches since 1956. Several of the main ones have been
expert or rule-based systems & knowledge representation systems. More
recently machine learning (neural-net based) systems have been the focus.
Following is a (very) quick history.
MYCIN[9]
was one of several medical AI systems developed at Stanford in the early to mid-1970s. It was written in Lisp & used a base of about 600 rules to perform diagnosis & suggest therapies for bacterial infections. It used the backward chaining inference engine [10] in Lisp to evaluate symptoms. It was never used in actual practice, but testing indicated that it performed better than internists at Stanford University Medical Center.
The next 20 years or so saw the development of many so-called
expert systems. These were reasoning systems that operated like MYCIN in that
they combined an inference engine of various types (backward-chaining, forward-chaining,
nondeterministic, etc.) with a set of information coded as if-then rules. They
were deductive in nature (operating mainly by first-order predicate calculus)
& limited by hardware & software capabilities to do this type of
reasoning in “reasonable” amounts of time. These systems were written both in
special purpose languages such as Lisp & Prolog as well as general,
Turing-complete languages such as C.
One such system was AM, the Automated Mathematician developed by
Doug Lenat at Stanford University. AM generated short Lisp phrases that were
interpreted as mathematical concepts. It had a Lisp-based reasoning engine
consisting of rules primarily about arithmetic. Lenat claimed that AM had
independently rediscovered the Goldbach Conjecture[11]
as well as several other fundamental theorems of arithmetic. Many computer
scientists at the time thought that Lenat over-interpreted the success of AM
(see below).
In any case, Lenat next wrote a system called Eurisko. It was
intended to serve as a general discovery & learning system (where AM only
functioned in the realm of arithmetic). It was architected differently than AM
& written in a knowledge representation language called RLL-1 (itself
written in Lisp). AM, & many other systems, had shown the importance of
making representations of knowledge, i.e.
facts & relationships, available to their inference engines. Such knowledge
provided context for rule application. Eurisko was tested on several types of
problems in several areas, but its biggest success was in the U.S. Traveler
Trillion Credit Squadron Tournament, a civilian wargame competition held in
southern California. The competition had an extensive set of rules of
engagement about how virtual fleets of ships would battle each other. Competitors
designed a fleet & then were paired against another team. A battle fought
according to the ROE was then simulated & a winner determined. In 1981,
Lenat entered the rules for that year’s competition into Eurisko & the
system designed an innovative & atypical fleet consisting of a very large
number of small, heavily armed vessels that were immobile. The Eurisko fleet
won the competition, even though all of the other fleets were conventional in
nature having large & small vessels & specific offensive &
defensive tactics. The Eurisko fleet allowed competitors to expend their
ammunition & sink many of its vessels, but because there were so many of
them, they eventually were able to sink all of the enemy’s fleet. Lenat also
competed & won, under a different set of rules, in 1982. After this, the
organizers banned Eurisko from the competition. The system was not so successful
in most of its other tests & was generally considered to be an interesting
but mostly unsuccessful experiment. Lenat wrote a very interesting paper in
which he opined that each system was more interesting than given credit for
& outlined directions for future research[12].
Lenat is currently the CEO of Cycorp, an AI research & services company
that is developing the Cyc Knowledge Base. This is, perhaps the ultimate
expression of the idea that human-like reasoning (strong AI) requires a
repository of structured knowledge. The Cyc KB consists of 500,000 terms,
17,000 types of relations & some 7,000,000 assertions relating these terms.
These are associated into contexts or “micro-theories” which structure &
guide reasoning in the system. Cyc KB is one endpoint of the knowledge-based
reasoning approach to machine intelligence.
There are two other examples of this type of AI that I’ll give
before switching to explore more contemporary machine learning systems. My very
strong belief is that the lessons we learned in designing, developing,
deploying & using these systems are relevant for the same functions in
machine learning systems – more on that later.
R1 was a “production system”, that is a rule-based system based
on If-Then rule execution, developed in the late 1970s by John McDermott (&
others) at Carnegie-Mellon University. Its goal was to evaluate customer orders
for Digital Equipment Corporation VAX 11/780 computer systems, determine that
all necessary components were on the order, add missing components &
produce a set of diagrams showing the three-dimensional relationships of all
components. These diagrams were to be used by technicians installing the
systems. The system was written in OPS-4, a language specialized for
production-type expert systems.
By the early 1980s, the system, renamed XCON, had been brought
in-house to Digital Equipment Corporation & several groups had been
established to both improve & maintain the system & to do additional
research on artificial intelligence[13]. XCON was in general use & proved to be quite successful, except that as new hardware configurations were added to the inventory, more & more productions (rules) had to be added. By the time I was associated with AI at DEC, the system had grown to well past 10,000 rules. Execution of OPS-4 (& later OPS-5) was nondeterministic so that any different execution of the system, even with identical input (customer order) might have a very different path through the rule base, that is the order that the rules fired in might be quite different &/or different rules might be used.
At one point in the mid-1980s, DEC hired John McDermott & research into the control of production systems needed to be done so that consistent results could be guaranteed. This included partitioning of the rules to make rule guidance more efficient. XCON was by any measure a great success, but it required a large, specialized staff to run & maintain. To be fair most enterprise-level systems have the same characteristic.
Finally, one of the advanced development projects[14]
that I lead during this time was aimed at producing a commercially reliable
knowledge-based system that searched for & identified analogies in a set of
knowledge & then reasoned about those analogies. This project was called
KNOVAX – “the only VAX that knows what it’s doing”. The motivation was my
opinion that much of the reasoning that we do as humans is based on analogies
(similarity-difference reasoning) & that a system that identified analogies
in a set of knowledge or information would be quite interesting &
potentially productive in certain pragmatic situations[15].
As we had learned from rule-based systems, providing both knowledge, usually
domain-specific knowledge, & context to an inference engine greatly
improved its execution & predictive ability. In KB systems, knowledge was represented
in several ways. In KNOVAX it took the form of frames. Frames were program
constructs that organized knowledge about an object & provided it as values
in “slots”. Frames were composed of slots that had identical organization.
Slots could be named, so that their values were identified with a concept or
construct. Slots could also contain relations (such as IS_A) or process
attachments (programs). The system had a set of rules (inference engine) for
identifying similar frames, comparing them in detail & proposing a set of
similarity relations among frames. It also had a module that produced a
human-readable (& hopefully human-understandable) report of why it created
the similarity relations. The following figure is a schematic of this type of
system.
The KNOVAX system scanned a frame-based KB, determined
similarities among objects & formed groups of similar objects. It also
provided explanations for why it related objects. One interesting feature of
the system was that in testing it occasionally formed similarity groups that
were not immediately understandable by human reviewers. In almost all such
cases however, after reading the explanation, the reviewer understood the
similarity & “learned” from the system.
The KNOVAX system was never shipped as a commercial product, but
the Boeing Commercial Airplane Company entered a substantial amount of product
development knowledge for the 777 program & used it to look for unexpected
relations & anomalies in the development cycle (BA777 first flight 6/12/1994).
3.
Lessons
Learned from Expert & Knowledge-Based Systems
Of course, any “lessons learned” are mainly the lessons I
learned related to the larger context of increasing knowledge about intelligent
systems. It’s worth noting that the informal “motto” of the Knowledge Systems
Laboratory (KSL) at Stanford University was “knowledge is power”, but that at
the 10th anniversary of its founding the assembled luminaries
thought that the motto “knowledge is knowledge” better represented the state of
their knowledge after 10 years & the lessons learned during that time…
In any case here’s my list (in no particular order):
·
· The type of reasoning matters – some types of
reasoning are better suited to specific types of problems.
Production (rule) systems perform a type of deduction (by substitution of
concepts or facts). This reasoning is optimal for systems that are structured
according to set-theoretic principles such as arithmetic. Some languages, such
as LISP, are also optimal for reasoning about these structured systems, as
shown by AM & Eurisko. Similarity & difference reasoning (analogy) is
better suited to comparison & classification problems (providing enough
information is available) as shown by KNOVAX. Constraint-based reasoning, reasoning based on relationships among
variables (facts) is effective for problems that can be formulated as sets of
requirements such as scheduling, sequencing or parsing problems. Description-based or ontological
reasoning descriptions
(ontologies) that describe individual entities in terms of concepts &
roles. It is applied to a large number of classification problems. It overlaps
substantially with other types of reasoning such as analogy-based methods.
· Idiosyncratic &/or “nonsensical results
must be explored as they
are often insightful, just not in the way you might imagine.
4.
Big Data
& Machine Learning
In 2003, I was a Technology Vice President at the EMC
Corporation responsible for document management & collaboration software
technology. I had been a VP at Documentum & a member of its CTO Group when
it was purchased by EMC. I was present at a meeting in early 1993 with Merck
& Co., one of Documentum’s premier customers, where they told us that their
next FDA submission would have at least 1 million discrete elements (documents,
lab & research results, reports, graphics & figures etc.). They
believed that our system could store this amount of data (~500TB) but wondered
if we could successfully search & locate specific data in that volume &
diversity of material. So did we…
This seems like almost a modest amount of data today when some healthcare organizations have in the range of 45-50 PBs of patient data & at the other end several projects at NASA generate about 100TBs of data per day. The fact that we can talk about exabytes (10E18 bytes) & zettabytes (10E21 bytes)is actually scary given that the Library of Congress collection of printed material (not images, voice, etc.) contains about 10-15 terabytes (10E12 bytes) or 0.00001 exabytes.
About the time I was at Merck for EMC, people started working on
technologies for dealing with this volume & variety of data. Sometime in 2002, Doug
Cutting & Mike Cafarella were working on an Apache search project called
Lucerne at the University of Washington. They developed a web indexer called
Nutch that eventually was able to run on up to 4-5 nodes & was indexing
hundreds of millions of web pages, but still was not operating at “web-scale”,
even for 2003-2004 timeframe. Engineers at Google published several seminal
papers around this time[20]
on the Google File System & MapReduce, a programming model &
implementation for processing very large data sets. Cutting & Cafarella
decided to use this set of technologies as the basis for an improved indexer
& rewrote their systems in Java (Google had implemented them in C++).
Cutting then joined Yahoo & over time Hadoop, the system that evolved from
the Nutch project became the basis for all search & transactional
interaction for Yahoo. By 2011 it was running on 42,000 servers with hundreds
of petabytes of storage. Yahoo spun out the distributed file system &
MapReduce as open source projects under Apache, & many other companies,
research groups & universities started developing tools, apps &
applications forming the Hadoop ecosystem. Several companies developing the
Hadoop ecosystem were also spun out, either directly or as engineers left Yahoo
including Cloudera & Hortonworks.
Today, most ultra-large-scale projects, whether they are directly search based or analytic, are layered on some flavor of Hadoop (or some flavor of Hadoop-inspired software such as Apache Spark). The point, however, is not that Hadoop is the ultimate answer for search & analytic processing in general [21](hint... it's not). It is that we have moved from enterprise distributed environments that include relational databases to shared-nothing clusters with massively parallel file & analysis systems. Those systems may be Hadoop based or Spark [22] based or use Dremel fro stream processing or visualization tools for presentation & visual analysis. We are now in an era of of massively parallel storage & analysis architectures, & these architectures enable a type of processing not previously possible (except with insanely expensive supercomputers). Analytics at this level are a separate topic, & I'll cover them in a separate briefing, but see my blog [23].
Today, most ultra-large-scale projects, whether they are directly search based or analytic, are layered on some flavor of Hadoop (or some flavor of Hadoop-inspired software such as Apache Spark). The point, however, is not that Hadoop is the ultimate answer for search & analytic processing in general [21](hint... it's not). It is that we have moved from enterprise distributed environments that include relational databases to shared-nothing clusters with massively parallel file & analysis systems. Those systems may be Hadoop based or Spark [22] based or use Dremel fro stream processing or visualization tools for presentation & visual analysis. We are now in an era of of massively parallel storage & analysis architectures, & these architectures enable a type of processing not previously possible (except with insanely expensive supercomputers). Analytics at this level are a separate topic, & I'll cover them in a separate briefing, but see my blog [23].
So, what does this have to do with machine learning, well… not much until recently. Let’s turn back a few pages. Many types of
systems could be said to be machine learning systems… KNOVAX could have been
called a machine learning system as CYC could be now. Most KB systems use
algorithms or models to explore a set of information or knowledge & develop
models or make relationships on this basis. Much of this function can be
reduced to some form of pattern matching. Rule-based or production systems use
rules to facilitate knowledge structure, relationship building & reasoning.
KNOVAX, for instance, had a knowledge structure (frames) & a set of rules
for reasoning about frames. At its core was a set rules for comparing the
information in separate frames & determining how similar (or dissimilar) it
was (~1000 frames, 15,000 relations, 250 rules). In this way, it proposed
“analogies” & was able to do limited reasoning about them. CYC operates
quite similarly, but at a very different scale (500,000 terms, 17,000
relations, 7,000,000 assertions/rules). Machine learning systems operate, in
general, by doing pattern matching on very large data sets (Petabytes of data,
x1015 bytes). Machine learning is a field of
computer science
that gives computers the
ability to learn without being explicitly programmed. It …”evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and
construction of algorithms that can learn from and make predictions on data. Such
algorithms overcome following strictly static program
instructions by making data-driven predictions or decisions, through building a model from sample inputs”[24].
It is
not my intent to give a comprehensive description of machine learning here,
there are many, many references that can do a better job than I can. Here are
several of them:
A
fairly comprehensive description from Wikipedia:
Another
description from tech emergence:
A
description of “deep learning”:
Another
perspective from MOT Technology Review:
I
would, however like to summarize some general information that will help in
making meaning of the examples I’ll use in a minute. When we talk about machine
learning today, we are typically talking about neural network systems. These
are networks of computational nodes that consist of an activator, an activation
function, optionally a threshold for activation & a computational function.
Each node takes a (numerical) input, executes if it is activated, &
propagates an output to the next set of nodes which function similarly[25].
Neural nets are designed to perform several types of functions:
- Classification – sorting entities into classes
- Clustering – sorting entities into affinity groups
- Regression – locating entities along a continuous functional gradient (criteria)
In
addition, they perform several types of learning (list not inclusive):
- Supervised – trained on a specific, unambiguous set of data
- Unsupervised – trained by executing function with a large amount of data that is not organized in any way
- Reinforcement – learning is confirmed by human (or machine) agency
- Adversarial – learning is through competition with other networks.
This
should be a good start. Next though, I’ll describe several recent
accomplishments/issues in machine learning & discuss what they might tell
us about this technology. In addition, I’ll also cover lessons learned (up
until now) & similarities of these lessons with our earlier insights from
expert & knowledge-based systems. The two topics I’ll discuss are
adversarial networks & advanced game playing.
The
second interesting thing about these networks is that they independently
developed a strategy where they negotiated in such a way as to give the
impression that they valued a specific object highly, when in fact they placed
a low value on it. They would later give up this “low-value” (to them) object
in order to acquire the object they actually did value. This is a very
sophisticated strategy to develop in an unsupervised process[27].
Google
DeepMind (Alphabet) developed a program to play the board-game Go (known as
WeiQi (
) in Chinese) which is
said to have been invented around 2300 B.C.E.). Go is substantially more
difficult to play than western chess which computers play by brute force.
Western chess has a median of ~40 moves per game with a very restricted number
of options per move. Go or Weiqi is estimated to have approximately 400 moves
per game (although some are much longer). Each move has about 100 possible
options per position. The theoretical bound[28]
on the number of moves per game on a 19x19 board is 1048. Such a
game would last long past the heat death of the Sun.
In October
of 2015, DeepMind’s program, AlphaGo, played Le Sedol considered the #4 player
in the world. AlphaGo beat Lee Sedol 4-1. Sometime after the match, a
researcher at DeepMind present a “fix” for the issue that caused AlphaGo to be
confused in Game 4 of the competition (the game Lee Sedol won) & showed by
simulation that the program would have beat its opponent in almost all possible
replays of the match. In May of 2017, AlphaGo played a three-game match against
Ke Jie then ranked as the #1 ranked player in the world at the Future of Go
Summit. AlphaGo won all three games & then was retired.
AlphaGo
uses a combination of deep learning & tree-search algorithms with multiple
networks performing different functions. It was trained using a database of
~30M moves from historical human games & then set to play itself in during
a period of reinforcement learning. AlphazGo ran on 48 distributed “tensor
processing units” (Google proprietary). Several other versions of the system
have been developed since 2016, most notably:
- AlphaGo Zero, October 2017[29] - AlphaGo Zero used no human game input for training but played itself using improved algorithms. It achieved super human play levels within three days & beat the version (AlphaGo) that beat Lee Sedol 100:0.
- AlphaZero, December 2017[30] – AlphaZero was a generalized version of the system. It used a single algorithm & achieved superhuman levels of play in Go, chess & shogi within 24 hours!
The
extremely rapid development of these systems without training or human
intervention (currently dubbed hyperlearning)
has led some AI researchers to speculate about the real possibility of a
general artificial intelligence.
Preliminary Lessons (I have) Learned from Big Data
& Machine Learning
- Big Data is different than AI but can do some similar things – Big Data is at its base statistical pattern matching in ultra-large data sets in order to perform functions such as classification, clustering & regression. In a sense, however, this type of analysis is not “statistical” at all. If a “point-of-care” recommendation system has millions of patient records over multiple years, this could be in the range of 10s of petabytes of data (starting to get big). If the system is processing data while a provider is entering patient data at the point-of-care, & it comes back & indicates that it has located 4,271 cases that match the current input, this is not a statistical statement. There is no sampling involved, the system has processed the entire universe of data & has found a specific number of cases. This, of course, leaves aside the implications of the fact that even millions of patient records over multiple years is not the “entire universe” of patient data. If, additionally, the system indicates the outcomes in all cases & the treatment used & then ranks the outcomes/treatment plans from most to least effective, again this is not a statistical result in the strict sense. If the system then goes ahead & uses a statistical modeling technique to predict the number of identical cases expected over a future time period, that is a statistical result. There are both interpretive & epistemological implications of this. I’ll discuss both in my follow-up on Big Data analysis.
Big data operates either by applying models to characterize a
very large data set, or by “discovering” empirical patterns in the data. The
example above is just such a discovery operation. The doctor enters data
relevant to her patient & the system finds records with the (close to)
identical pattern. It is then possible to determine things such as most
effective treatment options from the set of records matching the current
patient. Please note that this is not predictive, as mentioned above
- Big Data is not Machine Learning -
a.
Machine learning is hard – The idea of it is relatively
simple, but the design, development & deployment of ML systems based on
neural networks & other modules is complex & requires substantial
computing resources & a good deal of specialized knowledge about statistical
modeling & learning theory. This can be ameliorated by using any of the
cloud-based ML engines that are available such as
b.
(Google) TensorFlow – Currently available as
open source software, TensorFlow is a data graph application where nodes are
computational & edges are multidimensional data arrays (broadly tensors)
that are computed on & communicated among nodes.
c.
Microsoft
Azure – a cloud-based set of AI tools that includes data storage, ML tools, a
“workbench & integration with MS SQL Server
d.
AWS
(Amazon) – AWS offers a broad variety of ML & associated services including
both its own modeling & analytic tools & the ability to external tools
(like TensorFlow) in its cloud.
e.
SAS
& many others
Even with these systems, you have to decide what type of model
(network structure & weighting strategy) to use & how to train the
network. There are at least 30+ different types of networks currently in use
ranging from simple perceptrons to deep convolutional & adversarial
networks[31].
Each type of network represents a specific type of model tied to an execution
strategy. Selection & training of these models requires a good deal of
expertise.
- The details of ML are different, but many concepts & lessons learned are similar to earlier systems:
a.
ML networks appear to be more
effective using pre &/or post-processing of data/results enhanced by various types of search
(AlphaZero uses tree-based)
b.
ML networks appear to be more
effective using pre &/or post-processing of data/results with rules or productions
c. Representation
matters – The
way training data is structured can make training much more or less effective
d. Hyperlearning
is a game changer – Hyperlearning,
such as AlphaZero learning chess well enough in four ours with no supervision
or training set to beat the strongest chess programs, will change the way we
think about & use machine learning.
Some Final (not really) Thoughts
a.
Will AI mean the end of humanity?
– It’s
hard not to say something about this when such luminaries as (the late) Steven
Hawking, Elon Musk, Bill Gates & others, including some prominent AI
researchers, are very visibly of the opinion that AI in some form represents an
existential threat to the human race. Most people of this opinion do not
believe that a SkyNet[32]-like
entity will actively wage war against humanity in order to eliminate us. No, it
will be subtler than that… first will be the loss of jobs & the changes to
social & cultural institutions that accompany this & other changes.
Then the subtle (& some not so subtle) biases in our intelligent systems
will continue to cause the further evolution of sociocultural & economic systems.
Next will come the consequences of the social & cultural changes as
people’s motivation & ambition change,… then the long decline…
I don’t believe this. I don’t believe that AI is inherently
biased toward any specific set of outcomes, positive or negative, other than
those that we initially program into it. AI, is after all, not some aggregated
& integrated SkyNet-like entity, at least not yet. It’s a set (still a
relatively small set) of programs & systems directed at various type
analysis & problem-solving. It is not developed in some pristine &
culturally neutral background. Like all technology, it is developed in a social
& cultural context that is partly the context of technology &
technology development (male-dominated, quasi-egalitarian, etc.) & partly
the national & regional contexts of the location(s) where it is developed.
b.
Ethical Development – There is a lot to this, but I’ll
be as brief as possible (kinda)
i. Simplicity
& Understandability
– This really is (for me) the core of everything. “It
can scarcely be denied that the supreme goal of all theory is to make the
irreducible basic elements as simple and as few as possible without having to
surrender the adequate representation of a single datum of experience.[33]
Simplicity, except in the most formal sense such as in model theory[34],
is subjective. If something takes 47 closely-spaced pages to explain, you might
suspect that it could be simplified.
Very closely related
to simplicity is understandability. One motivation for emphasizing simplicity
is to improve the understandability of a model or analysis. This is one of the
biggest criticisms of AI & machine learning methods today – the fact that
they are for the most part black boxes & the reasoning (in AI systems)
&/or modeling & pattern recognition (in machine learning systems) is so
complex or random appearing as to be not understandable to mere humans. This
does beg the question, though, of if mere humans can’t understand the modeling
or analytic process, how are they supposed to understand & believe the
results produced by this process? Good question…
ii. Bias – There
are two major dimensions to the problem of bias & the use of AI &
machine learning. The first is data bias, & the second is algorithm bias.
Both of these problems are related to the fact that the collection & use of
data as well as the development & application of algorithms are ultimately
human activities that are embedded in social, organizational & cultural
contexts.
Data bias is probably
pervasive & can greatly affect the operation & results of applying
especially machine learning to real-world problems. As detailed previously in
this essay, most machine learning systems still are “trained” with sets of test
or training data. The selection of these training sets determines how the
system initially responds to problem data that it is exposed to. Bias in training
data usually takes the form of the data set only partially representing the
universe of discourse of the problem. In healthcare, for instance, almost all
large clinical data sets greatly underrepresent minorities. This influences the
machine learning system such that the results it presents, diagnosis of
specific syndromes for instance, are inaccurate with respect to the
underrepresented group(s). The footnote provides links to two recent articles
on this topic[35].
Of course, this kind of bias is nothing new in healthcare & has been an
issue since long before machine learning became the next shiny object in
clinical care[36].
This is an extremely important issue that is only just beginning to be
addressed.
The other side of the coin – algorithm bias, is just as
important. This is also inevitable unless very deep steps are taken to prevent
it. Algorithms are developed in social, cultural & even organizational
contexts which ensures that the biases inherent in these institutions are
represented in the machine learning systems underlying logic[37] .
This is quite difficult to detect &/or to correct. Knowing that it happens
is an essential first step, but over time development processes will have to be
adopted that help to ameliorate these biases. Independent review will need to
be a core part of these processes.
iii. Don’t be Creepy – I recently
attended a conference on the ethical use of “big data” in healthcare[38].
One of the keynote speakers was Farzad Mostashari, a former National
Coordinator for Health Information Technology at the Department of Health & Human Services. Farzad is a favorite of mine – he can always be counted on to
express important issues in his own style. He was speaking about his guidelines
for doing research with healthcare data. His primary admonition was “Don’t be
creepy”. The work you are doing should not make people’s skin crawl… The example he gave was that while he was at the ONC, a proposal was made to make people’s healthcare records available to them only if they passed a credit
check! This is truly creepy, unnecessary & contrary to the whole spirit of
providing care, especially in the safety-net where I primarily work. Use of AI
& machine learning in any segment should not creep people out (leaving
aside the situation where they are “creeped out” because they don’t agree with
the results. The sensibilities of the groups to whom the results refer must be
taken into account in the design of AI & machine learning studies & the
promulgation of their results.
Finally, the end…
I have been working on the
development of artificial intelligence, in one form or another, for about 40
years (seriously?). If you had asked me in 1988 or even in 1998 whether some of
the most interesting & important advances in computer science & real-world
problem solving would be coming from this area, I would have told you that the
time was past for AI to have that kind of general influence – that there were
areas where it would continue to be developed & deployed, but that it would
not become a major force in everything from marketing strategy to chip
development. I was wrong! I did not anticipate - & would not have
anticipated – the importance & influence that machine learning would have.
I have been working with a good number of ML start-ups & some
well-established companies developing ML in the past five years or so & I’m
struck by four things about this development.
- The depth & breadth of the development & the potential it has to improve our understanding of many, many fields of inquiry
- The similarity in the foundations & even in many of the methods of design & reasoning of current ML systems with earlier AI systems of various types (as detailed in this essay)
- The enthusiasm of the people working on ML. This very much reminds of the attitudes of people in the late 1970s & early 1980s when we really thought that a general AI could be developed & applied to a wide range of problem solving
- The amount of resistance & pushback that accompanies technology developments that challenge the status quo both intellectually & culturally in established fields (e.g. healthcare…)
All of this seems quite normal to
me, if anything the pace of development seems to have slowed, although this may
be a symptom of so much of this work being done in corporate contexts, so we do
not have a total view of the progress that is made. In any case, progress is
being made, quickly so that 5-8 years from now, much of the application of this
technology will seem “postechnical” [39]–
that is not visible as a separate technology, but simply part of how we do
“stuff”, whether that stuff is shopping or clinical research.
[2] Disclaimer:
This work is a “personal perspective”. The opinions are my own, but so are any
errors of fact, which are primarily the result of my fallible memory. DJH
[3]
See
among many others this interview with Ray Kurzweil: https://www.cfr.org/event/future-artificial-intelligence-and-its-impact-society
[4] Again, one of very many,
this from the World Economic Forum: https://www.weforum.org/agenda/2016/10/top-10-ethical-issues-in-artificial-intelligence/
[5] My
dissertation work in model theory was very relevant to the foundations of AI in
mathematics & epistemology. I started working specifically on AI & ML
as a Research Fellow at Stanford University in the 1970’s. I continued this
part of my work until the present day as a Visiting Scholar at Stanford (in
1987-88 while on leave as Chief Scientist for AI at the Digital Equipment
Corporation & as a Lecturer/Research Scholar at MIT (1998-99 &
2004-present)
[6] Stone,
P. et al. 2016. Artificial
Intelligence and Life in 2030." One Hundred Year Study on Artificial
Intelligence: Report of the 2015-2016 Study Panel, Stanford University,
Stanford, CA, September 2016. Doc: http://ai100.stanford.edu/2016-report.
Accessed: September 6, 2016.
[8] 1927-2011, Computer
Scientist, winner of the Turing Award, U.S. National Medal of Science & the
Kyoto Prize, developer of the Lisp programming language & influential in
the development of early AI systems. Taught at Dartmouth, MIT & Stanford.
[9] Shortliffe, E.H.; Buchanan, B.G. (1975).
"A model of inexact reasoning in medicine". Mathematical Biosciences.
23 (3–4): 351–379. MR 381762. doi:10.1016/0025-5564(75)90047-4.
[11] C. Goldbach wrote in a
letter to L. Euler in June of 1742 that “every number greater than 2 is the sum
of 3 primes”. This was problematic as Goldbach considered 1 a prime number (no
longer taken as correct). Euler re-expressed the conjecture as “all positive,
even integers can be expressed as the sum of 2 primes”. This “conjecture” has
still not been proved.
[12] Lenat,
D. B., and Brown, J. S. (August 1984). "Why AM and EURISKO appear to work."
Artificial Intelligence 23(3):269—294.
[13] The
author (DJH) was Chief Scientist for Artificial Intelligence at DEC from
1986-1989 & was responsible for research in expert & knowledge based
systems.
[14] DEC
had three categories for development projects: 1) product development expected
100% of projects to results in commercial products, 2) advanced development
projects expected >50% of projects to result in commercial products & 3)
research expected <50% of projects to result in commercial products.
[15] Hartzband, D.J. & L. Holly. 1988. The provision of induction in data-model
systems: II. Symmetric comparison. IJAR. 2(1):5-25.
Hartzband, D.J. 1987.
The provision of inductive problem solving and (some) analogic learning in
model-based systems. Group for Artificial Intelligence and Learning (GRAIL), Knowledge
Systems Laboratory. Stanford University. Stanford, CA, USA. 6/87.state of their
knowledge after 10 years &
[16] D.B.
Lenat & E.A. Feigenbaum. 1987. On the Thresholds of Knowledge. MCC
Technical Report AI-126-87. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.4196&rep=rep1&type=pdf
[17] Microelectronics
& Computer Technology Corporation. Founded in Austin, TX in 1982 &
funded by a number of American computer technologies companies, MCC did R&D
work on systems architecture, hardware design, environmentally friendly tech
& AI. It was disbanded in 2000/
[19] Hartzband, D.J. & L. Holly. 1988. The
provision of induction in data-model systems: II. Symmetric comparison. IJAR.
2(1):5-25.
Hartzband, D.J. 1987a. The
provision of inductive problem solving and (some) analogic learning in
model-based systems. Group for Artificial Intelligence and Learning (GRAIL),
Knowledge Systems Laboratory. Stanford University. Stanford, CA, USA. 6/87.
[20] Chemawat, S., H. Gobioff & S-T Leung. 2003. The Google File System.
ACM 1-58113-757-5/03/0010. & J. Dean & S. Chemawat. 2004. MapReduce: Simplified Processing on
Large Clusters. 6th Symposium on Operating Systems Design &
Implementation. 2004. 137-149. San Francisco, CA.
[22] http://www.computerworld.com/article/2856063/enterprise-software/hadoop-successor-sparks-a-data-analysis-evolution.html
[23] Healthcare
Analytics: Landscape & Directions. https://posttechnical.blogspot.com/2014/06/healthcare-analytics-landscape.html
Healthcare Analytics: Concepts & Assumptions. https://posttechnical.blogspot.com/2014/12/healthcare-analytics-concepts.html
Big Data Analytics: Predictions about the Present.
[24]
https://en.wikipedia.org/wiki/Machine_learning
[27] https://www.fastcodesign.com/90132632/ai-is-inventing-its-own-perfect-languages-should-we-let-it,
https://www.inverse.com/article/32978-facebook-ai-artificial-intelligence-negotiate-haggle-ruthless-chatbot-fb
, Lewis, M. et al. Deal or No Deal?
End-to-End Learning for Negotiation Dialogs. arXiv:1706.05125v1 [cs.AI]
[28]
https://senseis.xmp.net/?NumberOfPossibleGoGames
[29] Silver, David;
Huang, Aja;
Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den;
Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot,
Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya;
Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis
(28 January 2016). "Mastering
the game of Go with deep neural networks and tree search".
Nature.
529 (7587): 484–489. Bibcode:2016Natur.529..484S.
doi:10.1038/nature16961.
ISSN 0028-0836.
PMID 26819042
[30] Silver, David; Hubert, Thomas;
Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur;
Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap,
Timothy; Simonyan, Karen; Hassabis,
Demis
(5 December 2017). "Mastering Chess and Shogi by Self-Play with a General
Reinforcement Learning Algorithm". arXiv:1712.01815
[31]
https://en.wikipedia.org/wiki/Types_of_artificial_neural_networks/
[32] The Terminator – 1984 (!) movie directed by James Cameron, starring
Arnold Schwarzenegger in which a national defense AI becomes “aware”, decides
that humans are a threat to its existence & wages war to eliminate them
[33] A.
Einstein, On the Method of Theoretical Physics, Herbert Spencer Lecture, Oxford
University, June 10, 1931. Most probably the origin of the aphorism, also
attributed to Einstein, “Everything should be as simple as possible, but no
simpler.”
[34] c.f. Hartzband, D.J. 1972. Eine Logik
für das Ableiten der minimalen grundlegenden Annalmen für mehrfahe Modelle. Dissertation.
Universität Hamburg. DFR.
[38] Health
And… Data Science and Public Action. NYU Langone School of Public Health.
5/21/2018
[39]
It is not by chance that my consultancy is named PostTechnical Research – The
idea is that a postechnical context is one where you do not notice your use of
any specific technology… you just do what you want/need to do & technology
transparently supports you. I intend no value judgement about this (that’s the
topic for another white paper). It is just a fact that this will be the case
for most people in the next 5-8 years – at least IMNSHO. Welcome to the
“past-informed” future…