Annotated References

This is a list of basic references that I believe are among the best to initially dive into areas that I work in and love.  I have added some comments that should be taken with a few grains of salt. I did not include references to research papers or some of the deeper literature – I may be able to provide some recommendations if you contact me with specific questions.

I have not included any of the rather large number of casual introductions that provide icing but no cake.  I have found that all too many articles purporting to be "easy introductions" to an area of study neglect some truly fundamental issues, are biased (either commercially or with an academic ax to grind), or leave some very unfortunate mis-impressions. For example, how many times have people new to pattern classification become convinced that the algorithm that they read about is the "best" - being totally unaware of "no free lunch" theorems.  That said, my selection of references reflects my biases.  Caveat lector.

The Semantic Web community seems to me to under-emphasize the primacy of machine learning, inference, and practical aspects of knowledge management, although this is being challenged now with all the hype around "big data".  So, my ordering of subjects for discussion here is intentionally contrary.

 

1 General Background on Pattern Recognition and Machine Learning

The following books are excellent introductions to pattern recognition technology and algorithms.  This includes density estimation, clustering, classification, regression, and summarization methods. Some less rigorously motivated data mining techniques (such as association rule induction) are not covered in these references.

C. M. Bishop
Pattern Recognition and Machine Learning
(Springer, 2007)
http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738/ref=pd_rhf_f_i_cs_1

I have not used this book, but I loved Bishop’s 1995 Neural Networks for Pattern Recognition and I have seen reviews that are very favorable. If I had to choose just one book as a general introduction to pattern recognition, it probably would be this one. 

 

R. O. Duda, P. E. Hart and D. G. Stork
Pattern Classification (2nd Edition)
(John Wiley & Sons, 2001)
http://www.amazon.com/Pattern-Classification-2nd-Richard-Duda/dp/0471056693

The first edition is a classic! This second edition narrows its scope but deepens and strengthens its presentation. Many wonderful insights and algorithms are presented well. Early printings were riddled with typos, but can be fixed using a list available on the Web. Also, it should be regarded as a survey, not as a definitive source. In this light it is excellent as a general introduction.

I taught a portion of a course that was using this book – my lecture focus was on the ways that models can be tested and validated.  I thought that the book did a good job of preparing the students (who were engineers and mathematicians at MITRE Corporation).

 

T. Hastie, R. Tibshirani and J. H. Friedman
The Elements of Statistical Learning – Data Mining, Inference, and Prediction
Springer Series in Statistics, Second Edition, 2009
http://www.amazon.com/Elements-Statistical-Learning-T-Hastie/dp/0387952845/ref=sr_1_1?ie=UTF8&s=books&qid=1209745099&sr=1-1

This is an excellent book on data mining based on a very statistical perspective. It is very possibly the best single introductory book on data analysis at a reasonably advanced level.  Remarkably, the authors have made the book available in PDF online for free, accessible from this page:
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

 

2 Statistical Learning Theory and Kernel Methods

Kernel methods are based on the observation that many machine learning algorithms are, or can be generalized using a “trick” (which has a rigorous basis, of course).  This generalization takes a linear algorithm (such as simple classification or regression) and re-casts the underlying mathematics so that the representation of the calculations uses dot (inner) products (as in vector dot products). For example, the dot product naturally arises in the derivation of ridge regression using the dual representation.

The approach is straightforward.  Data items are mapped from an input space to a feature space using some (non-linear) function, say Φ(x). Linear relations such as a regression line or a classification boundary are sought in the feature space.  The algorithms used within the feature space are implemented using pairwise dot products only. 

So what is the problem?  Well, computing Φ(x) for the data might be HORRIBLY inefficient or hard.  There are many reasons why this may be so.  But for many functions we can find another function which is relatively easy to compute called a “kernel” –

                k(X, Y) = Φ(X) ● Φ(Y)

We look for those situations where we can compute k(X, Y) in some direct way that is equivalent to transforming each data point using Φ and then applying a dot product, but is much easier.  We substitute its use wherever we have the dot product in a linear algorithm and by magic we have a non-linear more general algorithm. To make this formal, some facts have been established about the properties of kernels and how they may be composed.  To put the entire setting of the problem of learning patterns on firm ground, some facts have been established about the balancing of generalization of properties versus specialization to a particular sets of data.

Many of the really interesting developments in machine learning in the last two decades or so have been based on this trick and on deeper insights into the fundamental trade-offs while learning.

Below are some references that address Statistical Learning Theory (the management of learning trade-offs) and Kernel Methods (the algorithms).  The much celebrated Support Vector Machine approach is the best known application of kernel methods. Related to these topics, but also of great interest are boosting methods.  Boosting is used to create prediction rules that can be highly accurate by a process of combining weaker and less accurate rules.

In addition to the books that I have listed below are a number of interesting papers at  http://www.kernel-machines.org/.

John Shawe-Taylor & Nello Cristianini
Kernel Methods for Pattern Analysis
Cambridge University Press, 2004

http://www.kernel-methods.net/

This book is an excellent introduction with many details and references.  A great place to start.

 

N. Cristianini and J. Shawe-Taylor
An Introduction to Support Vector Machines (and other kernel-based learning methods)
Cambridge University Press, 2000
http://www.support-vector.net/

This is the predecessor to Kernel Methods for Pattern Analysis; It is more focused on SVMs but clear and useful in laying the foundation for understanding kernel methods more generally.

 

B. Schölkopf and A. J. Smola
Learning with  Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
MIT Press, 2002
http://www.amazon.com/Learning-Kernels-Regularization-Optimization-Computation/dp/0262194759

This book is a deeper introduction with many excellent discussions regarding kernel design. Highly recommended.

 

3 Additional General Background on Pattern Analysis

I do not recommend any of the books in this section unless you either wish to dive deep or you have masochistic tendencies.  But they are additional good general introductions with twists that make them of interest.

J. Hertz, A. Krogh, and R.G. Palmer
Introduction to the Theory of Neural Computation
Westview Press; New Ed edition (January 1, 1991)
http://www.amazon.com/Introduction-Computation-Institute-Sciences-Complexity/dp/0201515601/ref=sr_1_1?ie=UTF8&s=books&qid=1209745384&sr=1-1

This is a view of neural network methods based on statistical mechanics.  This book is interesting and very refreshing for those tired of poorly formulated biological analogies and wondering if another paradigm could motivate investigation.  It is particularly good for the physics majors in the crowd. 

 

B. D. Ripley
Pattern Recognition and Neural Networks
(Cambridge University Press, 1995)
http://www.amazon.com/Pattern-Recognition-Neural-Networks-Ripley/dp/0521460867/ref=sr_1_12?ie=UTF8&s=books&qid=1209759990&sr=1-12

This book is a very complementary to Bishop’s book listed above.  It covers much the same material, but with a more rigorous statistical foundation and with very comprehensive references.  He also is not shy about his opinions on research directions. 

 

V.S. Cherkasski and F. Mulier
Learning from Data: Concepts, Theory, and Methods
Wiley-IEEE Press; 2nd edition, 2007
http://www.amazon.com/Learning-Data-Concepts-Theory-Methods/dp/0471681822/ref=sr_1_1?ie=UTF8&s=books&qid=1209770398&sr=1-1

Another good introduction (although I have not seen this edition.)  Occasionally, I refer to the first edition of this book to supplement other sources.

 

4 Deeper Pattern Recognition General References

These references dive much deeper than the references listed above.  They are only recommended if you wanted to see what the mathematical foundation of learning looks like.  I have not listed many other references which are valuable, especially the books and papers by Vladimir Vapnik, which are brilliant but a bit opaque.  The papers below are excellent doorways to the research literature.

T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi.
General Conditions for Predictivity in Learning Theory.
Nature 428 (2004): 419-422. (PDF)

This is an excellent paper that explains the concept of stability as applied to learning in a clear compact exposition.  Worth reading. 

 

C. Tomasi
Past performance and future results
Nature 428 (2004): 378
http://cbcl.mit.edu/projects/cbcl/news/files/news-views-march-04.pdf

A comment on the above paper that reinforces some important questions regarding small samples and stability. 

 

F. Cucker, and S. Smale.
On The Mathematical Foundations of Learning.
Bulletin of the American Mathematical Society 39, no. 1 (2002). (PS - 1.7 MB)

Learning theory for mathematicians.  Excellent.

 

T. Poggio and S. Smale.
The Mathematics of Learning: Dealing with Data
Notices of the American Mathematical Society (AMS), Vol. 50, No. 5, 537-544, 2003.

Deep and delightful. A bit more readable than Cucker and Smale.

 

S. Boucheron, O. Bousquet, and G. Lugosi.
Theory of Classification: A Survey of Recent Advances.
ESAIM: Probability and Statistics 9 (2005): 323-375. (PDF)

S. Boucheron, O. Bousquet, and G. Lugosi.
Introduction to Statistical Learning Theory. In Advanced Lectures on Machine Learning. Lecture Notes in Artificial Intelligence 3176. Edited by O. Bousquet, U. Von Luxburg, and G. Ratsch. Heidelberg, Germany: Springer, 2004, pp. 169-207. (PDF)

Both of the above papers are excellent surveys of the state of the art of statistical learning theory.  The second is easier to read.

 

L. Devroye, L. Gyorfi, and G. Lugosi
A Probabilistic Theory of Pattern Recognition
(Springer, 1997)
http://www.amazon.com/Probabilistic-Recognition-Stochastic-Modelling-Probability/dp/0387946187/ref=sr_1_1?ie=UTF8&s=books&qid=1209770686&sr=1-1

A deep read into the 2-class pattern classification problem.  Worth looking at after you think you know what is going on. Highly theoretical.

Another source is the MIT graduate course in the Brain and Cognitive Sciences department on Statistical Learning Theory and Applications (9.520). http://www.mit.edu/~9.520/ The course is taught by seminal thinkers in the area and the handouts and syllabus are of the absolute highest quality.

 

5 Bayesian / Belief Networks

I have mixed feelings about Bayesian methods.
 (That’s a pun, by the way!) 

 

David J. C. MacKay
Information Theory, Inference, and Learning Algorithms
Cambridge University Press, 2003

http://www.amazon.com/Information-Theory-Inference-Learning-Algorithms/dp/0521642981

 

MacKay’s book is remarkable – a mix of probability theory, information theory, coding, some statistical physics, and some pattern recognition algorithms. This unusual combination of topics gives rise to some idiosyncratic presentations which often provoke new insights.  It has a Bayesian flavor and provides some introduction to Bayesian inference and belief networks.

 

M. I. Jordan (editor)
Learning in Graphical Models
(M.I.T. Press, 1999)
http://www.amazon.com/Learning-Graphical-Adaptive-Computation-Machine/dp/0262600323/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1209945039&sr=1-1

This is a good place to start if you are serious about belief networks.  This is a collection of papers, four of which are tutorial, including Heckerman’s well known tutorial. 

 

Judea Pearl
Causality: Models, Reasoning and Inference
Cambridge University Press, 2nd edition, 2009

http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X/ref=la_B001HCTYSO_1_1?ie=UTF8&qid=1367064908&sr=1-1

Delightful. Provocative. Convincing.

 

Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B. Rubin
Baysian Data Analysis
Chapman and Hall/CRC, 2nd Edition, 2003

http://www.amazon.com/Bayesian-Analysis-Edition-Chapman-Statistical/dp/158488388X/ref=pd_cp_b_0

A graduate level statistics text.  A third edition is due in late 2013.

 

 

6 Formal Linguistics

It is easy to get lost in the maze of literature and schools of thought in formal linguistics.  The mainstream of generative grammar stems from the brilliant contributions of Noam Chomsky at MIT.  There have been significant breaks from this mainstream.  The core controversies revolve around whether syntax should play the central role it does in formal linguistics and whether the transformational mechanisms developed by Chomsky and others are necessary or psychologically real.

Recently, the mainstream has been focused on a retrenchment called the “minimalist program.”  This is an attempt to reduce the transformational structure (that is, the correspondence between deep structure and surface structure) to a set of principles that govern core linguistic phenomena across all languages. 

Like many students of linguistics, I had originally learned something about transformational grammar from one of the books of Andrew Radford, in my case, his book Transformational Syntax, Cambridge University Press, 1981.  When I wanted to update some of my background I turned to Radford’s Minimalist Syntax, Cambridge University Press, 2004.  This is an excellent book, but I found myself increasingly at odds with its analysis.

I ended up rejecting the transformational approach entirely.  The best thinking about alternative approaches that I have found is from Ray Jackendoff, one of Chomsky’s students. A “must read” book on linguistics and cognitive science is:

Ray Jackendoff
Foundations of Language: Brain, Meaning, Grammar, Evolution
Oxford University Press, 2003
http://www.amazon.com/Foundations-Language-Meaning-Grammar-Evolution/dp/0199264376/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1209833963&sr=8-2

In fact, if you were to read only one book this year, I would recommend it to be this one.  Certainly, one of the best linguistics / cognitive science books I have ever read. 

Other books from Jackendoff that I strongly recommend that pertain to cognitive structures (semantics) are:

Semantic Interpretation in Generative Grammar, MIT Press, 1972

Semantics and Cognition, MIT Press, 1983

Semantic Structures, MIT Press, 1990

Language, Consciousness, Culture: Essays on Mental Structure, MIT Press, 2007

And last, one of the most interesting books on syntax is:

Ray Jackendoff and Peter Culicover
Simpler Syntax
Oxford University Press, 2005
http://www.amazon.com/Simpler-Syntax-Peter-W-Culicover/dp/0199271097/ref=sr_1_5?ie=UTF8&s=books&qid=1209836424&sr=1-5

The Simpler Syntax hypothesis is that syntactic structure is only as complex as is required to form the connection between morphology (sounds, text forms) and cognitive structures (meaning).  This does not mean that syntax is simple – only that the central position of syntax and the highly elaborated transformations of mainstream generative grammar have somehow diverged from reality.  Simpler Syntax (like some other similar approaches) continues to be generative but rejects transformations and most of what would constitute a deep structure (except for a mechanism which handles the mapping of grammatical function).

The nature of Simpler Syntax “rules” is more akin to constraints – one set of constraints govern constituents, the other word order.  And, the division between grammar and lexicon is broken down – lexical material is in a continuum from words to idioms and other constructions to general rules – all of which are connections between morphology, syntax, and meaning.

As for the structure of English, there is a remarkable descriptive grammar that has supplanted earlier works (such as Quirk, et al.) that weighs in (literally) at 1860 pages and 5.6 pounds:

Huddleston, RD and Pullum, GK
The Cambridge Grammar of the English Language
Cambridge University Press, 2003

http://www.amazon.com/Cambridge-Grammar-English-Language/dp/0521431468/ref=sr_1_1?ie=UTF8&s=books&qid=1209836700&sr=1-1

A summary of CGEL (as it is known) by the same authors is a text which can also be used as a reference:

Huddleston, RD and Pullum, GK
A Student’s Introduction to English Grammar
Cambridge University Press, 2005

This book follows much of the structure of CGEL and I usually consult it first before diving into the trenches with CGEL.

Sample chapters (1 and 2) of CGEL are available here: http://www.cambridge.org/uk/linguistics/cgel/sample.htm

I recommend reading Chapter 2 as an intro to their style of exposition and the structure of English.

A particularly interesting paper explores formal computational power considerations and how they are related to the assumptions and terminology of CGEL:

Pullum, GK and Rogers, J
Expressive power of the syntactic theory implicit in The Cambridge Grammar of the English Language  http://ling.ed.ac.uk/~gpullum/EssexLAGB.pdf

I also recommend Peter Culicover’s review of CGEL.  (Culicover is the co-author of Simpler Syntax).   The review is available here: http://www.cogsci.msu.edu/DSS/2005-2006/Culicover/CGEL%20Review.pdf

 

A complementary view to Jackendoff is in the work of Pustejovsky.  I recommend:

Pustejovsky, James
The Generative Lexicon
MIT Press, 1998
http://www.amazon.com/Generative-Lexicon-Language-Speech-Communication/dp/0262661403/ref=sr_1_1?ie=UTF8&s=books&qid=1209838181&sr=1-1

Although primarily focused on nouns, this book describes the combinatory nature of meaning. 

Finally, I have thoroughly enjoyed a number of other perspectives.  A favorite is by George Lakoff (who was on the other side of an acrimonious period in linguistic development from Jackendoff, still known as the “linguistic wars.”)

Lakoff, George
Women, Fire, and Dangerous Things.  What Categories Reveal about the Mind.
University of Chicago Press, 1990
http://www.amazon.com/Women-Dangerous-Things-George-Lakoff/dp/0226468046/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1209837855&sr=1-1

Lakoff also wrote Metaphors We Live By (with Mark Johnson) and has contributed at the intersection of linguistics and cognitive science.  He is no stranger to controversy and I find him interesting if not always convincing.

 

I favor the frameworks of Head Driven Phrase Structure Grammar, Situation Semantics and, more recently, Sign Based Construction Grammar. 

Sag, Ivan, Wasow, Thomas, and Bender, Emily M.
Syntactic Theory: A Formal Introduction, 2nd Edition
Center for the Study of Language and Inf; Second Edition, 2nd Edition, 2003

http://www.amazon.com/Syntactic-Theory-Introduction-Language-Information/dp/1575864002/ref=sr_1_1?s=books&ie=UTF8&qid=1367066922&sr=1-1

A somewhat simplified HPSG model of English, but a very fine introduction to the HPSG approach to syntactic theory. The model they present is revised and elaborated as the book progresses.

 

Ginzburg, Jonathan and Sag, Ivan
Interrogative Investigations: The Form, Meaning, and Use of English Interrogatives
Center for the Study of Language and Information - Lecture Notes, 2001

http://www.amazon.com/Interrogative-Investigations-Interrogatives-Language-Information/dp/1575862786/ref=la_B001K8EQLK_1_1?ie=UTF8&qid=1367065838&sr=1-1

Deep but worth the effort mining its gems. A bit like a sip from a firehose.

 

Goldberg, Adele
Constructions: A Construction Grammar Approach to Argument Structure
University Of Chicago Press, 1995

http://www.amazon.com/Constructions-Construction-Approach-Structure-Cognitive/dp/0226300862/ref=sr_1_3?s=books&ie=UTF8&qid=1367066691&sr=1-3

Goldberg, Adele
Constructions at Work: The Nature of Generalization in Language
Oxford University Press, USA, 2006

http://www.amazon.com/Constructions-Work-Nature-Generalization-Language/dp/0199268525/ref=pd_sim_b_1

Definitely worthwhile and great launchpads for other construction-oriented analyses.

 

Boas, Hans C. and Sag, Ivan A. (eds)
Sign-Based Construction Grammar
Center for the Study of Language and Information 2011

http://www.amazon.com/Sign-Based-Construction-Grammar-Language-Information/dp/1575866285/ref=sr_1_1?s=books&ie=UTF8&qid=1367067387&sr=1-1

A collection of papers that unify into a HPSG framework Berkeley Construction Grammar.  Inspiring.

 

I can very highly recommend two books on semantics in addition to the Ginzburg and Sag book above..

Heim, Irene and Kratzer, Angelika
Semantics in Generative Grammar
Wiley-Blackwell, 1998

http://www.amazon.com/Semantics-Generative-Blackwell-Textbooks-Linguistics/dp/0631197133/ref=sr_1_1?s=books&ie=UTF8&qid=1367067656&sr=1-1

A text from a very mainstream vantage. This book provides an introduction to the kinds of arguments made in semantics research.

 

Davis, Steven and Gillon, Brendan S. (eds)
Semantics: A Reader
Oxford University Press, USA, 2004

http://www.amazon.com/Semantics-Reader-Steven-Davis/dp/0195136985/ref=sr_1_1?s=books&ie=UTF8&qid=1367068096&sr=1-1

An indispensible collection of many seminal papers and many clear expositions in a very wide-ranging views of semantics.  This book is a bargain considering the difficulty and cost of individually gathering its contained papers.

 

7 Computational Linguistics

Formal and statistical approaches to machine understanding have wide and varied literatures.  Strong opinions about the efficacy of different approaches seem to be the norm; I believe an eclectic approach is best and the best way to start is to survey a large number of techniques and implemented systems.

Formal parsing approaches have a long history.  While it is logical to assume that a formal linguistic training would be a great preparation, an oft repeated quip is that every time a computational linguistics project hires a linguist, their accuracy drops. Still, I think that to the extent that a linguistic phenomena is understood, that formal methods should be used.  Use statistical methods for less well understood phenomena.

The statistical approach to language understanding spans a huge array of attacks on NLP problems at different levels of analysis ranging from morphology, tagging, probabilistic parsing, and more.

A very good introduction to computational linguistics that is coming out in its second edition this month is:

D. Jurafsky and J. Martin
SPEECH and LANGUAGE PROCESSING
An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition

Prentice Hall, 2008, Second Edition

I had a chance to read some of the revisions from the first edition. While I do not always agree on their selection of methods to illustrate or the trade-offs that they advocate, I do think that the book is an excellent survey.

 

E. Charniak
Statistical Language Learning
(M.I.T. Press, 1996)
http://www.amazon.com/Statistical-Language-Learning-Speech-Communication/dp/0262531410/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1209826363&sr=1-1

This is a very clear and concise (short) introduction to statistical linguistics and a good start for deeper investigations.  It discusses probabilistic chart parsing, hidden Markov models, and word clustering without getting caught up in the details.  I think it is one of the best books to start with in statistical linguistics because it motivates deeper thinking extremely well. Also worth visiting Charniak’s web page at Brown.edu.

 

C. Manning and H. Schuetze
Foundations of Statistical Natural Language Processing
MIT Press, 1999
http://www.amazon.com/Foundations-Statistical-Natural-Language-Processing/dp/0262133601/ref=pd_bxgy_b_img_b

An excellent introduction with a bit more elementary material and support.  Probably eclipsed now by Jurafsky and Martin.

Going beyond these introductions requires expeditions into the literature and various resources.  It is more difficult to provide specific guidance because the span of approaches described in the literature is so broad.  Some of the most important threads are for patterned relation extraction and for predicate-argument structure matching.

The most widely used lexical resource is WordNet: http://wordnet.princeton.edu/  WordNet is a very large sense enumerative lexicon with a very broad coverage of English. WordNet related publications are available at the site.  In addition, an (early) collection of papers was assembled and published:

 

C. Fellbaum (Editor) and G. Miller (Preface)
Wordnet: An Electronic Lexical Database
(M.I.T. Press, 1998)
http://www.amazon.com/WordNet-Electronic-Database-Language-Communication/dp/026206197X/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1209842862&sr=1-1

Other resources include VerbNet, FrameNet, PropBank, and NomBank.  All can be found easily on the Web with an abundance of associated papers. 

One important direction is the induction of ontologies and lexicons from corpuses.  This is closely related to information extraction from text. 

 

8 Description Logics, OWL, and the Semantic Web

There are different conceptions of what should constitute a better “relationship” between humans and machines.  One family of the articulations of these conceptions is the W3C Semantic Web. The Semantic Web is often attributed to Sir Tim Berners-Lee.

Berners-Lee, Tim; James Hendler and Ora Lassila "The Semantic Web". Scientific American Magazine. (May 17, 2001).

The W3C maintains a large number of resources that may be accessed here: http://www.w3.org/2001/sw/

Much of the effort of the Semantic Web development community is not without controversy.  This exists at a variety of levels:

  • What should be the goals of the Semantic Web?  Indeed, what is the Semantic Web?
  • What should the architecture of a Semantic Web look like?  Is it like the “layer cake” of services that is so often reproduced?
  • What is the relative role of representation vs. inference?  This is a big and deep question.  We can, for example,
  • Store many pieces of data (say, triples) and retrieve using queries that match patterns (written in say, SPARQL);
  • Use a rule-based formalism to recursively compute answer sets; and
  • Use a Description Logic formalism for representing knowledge and computing implicit relationships (classification, transitive roles, and propagated roles).
  • What is the form that Web pages and other Web-accessible resources should encode self-descriptive metadata and data?
  • How should Semantic Web services be packaged and deployed?  What security (encryption and trust) services should be used?
  • How does the Semantic Web relate to other semantic technologies including NLP, other forms of knowledge representation and reasoning, and knowledge visualization?

(I could go on.)

To listen to many Semantic Web advocates (including companies) the Semantic Web is almost equivalent to RDF and SPARQL.  In fact, RDF and SPARQL are interesting and useful, no doubt.  But they are at best a partial solution to some knowledge representation problems and no solution at all to the most interesting problems.

RDF is a fine data interchange format.  Beyond that, large sets of RDF triples are incredibly hard to manage efficiently and are disjoint from human understanding.

OWL is derived from decades of research in Description Logics.  Description Logic engines enable the construction of many types of intelligent software agents to act on behalf of people.  The most important agent behaviors are automatic classification of knowledge (particularly incrementally learned), translation of one organization of knowledge to another organization, and negotiation of protocols and services provided by different programs or data sources.  A core enabling operation of many agents is the ability to create semantic metadata of documents (and other media) and combine knowledge from that metadata in ways that are meaningful to automated processes.  For this to happen, ontologies will play a key role as a source of precisely defined and related terms (vocabulary) that can be shared across applications (and humans).  DL technology formalizes ontology construction and use by providing a decidable fragment of First Order Logic (FOL).  This formality and regularity enables machine understanding for the support of agent-agent communication, semantic-based searches, and provide richer service descriptions that can be interpreted by intelligent agents.

Rule formalisms can complement both RDF querying (using SPARQL) and description logic reasoning.  But no single rule formalism can capture all of the kinds of reasoning that different common problems require.  Various flavors of Datalog, constraint systems, SAT solvers, and of course Prolog are but a small sample.

And this is to say nothing of pattern discovery and machine learning algorithms which invariably would require data to be represented in forms quite different from RDF.

Finally, the three species of OWL version 1 were not found to be very useful.  OWL Full is first order logic and requires powerful but slow theorem proving techniques and is, of course, not decidable.  The other two species were impoverished in their ability to reason about relationships between concepts, among other deficiencies. 

The proposal for OWL version 2 (recently renamed from 1.1) remedies most of these problems.  However, there is as yet no practical implementation of SROIQ, the particular variant of logic that underlies the full OWL V2.  As a consequence, several fragments (or “profiles”) of OWL V2 have been defined, the most important being EL++.  I have developed a full implementation of an EL++ reasoner which I have applied to a number of large problems.

The Description Logic home page is located at: http://dl.kr.org/

Information on OWL 2 can be found at these links:

http://www.w3.org/2007/OWL/wiki/Primer

http://www.w3.org/TR/owl2-syntax/

http://www.w3.org/TR/owl2-semantics/

http://www.w3.org/TR/owl2-profiles/

Other papers that describe Description Logic technology that are good starting places are the following.

Franz Baader, Ian Horrocks, and Ulrike Sattler. Description Logics. In Frank van Harmelen, Vladimir Lifschitz, and Bruce Porter, editors, Handbook of Knowledge Representation. Elsevier, 2007. .pdf

Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The Even More Irresistible SROIQ. In Proc. of the 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR 2006), pages 57-67. AAAI Press, 2006. .pdf

Boris Motik, Rob Shearer, and Ian Horrocks. A Hypertableau Calculus for SHIQ. In Proc. of the 2007 Description Logic Workshop (DL 2007), volume 250 of CEUR (http://ceur-ws.org/), 2007. .pdf

Boris Motik, Rob Shearer, and Ian Horrocks. Optimized Reasoning in Description Logics using Hypertableaux. In Proc. of the 21st Int. Conf. on Automated Deduction (CADE-21), volume 4603 of Lecture Notes in Artificial Intelligence, pages 67-83. Springer, 2007. .pdf

Dmitry Tsarkov, Ian Horrocks, and Peter F. Patel-Schneider. Optimizing Terminological Reasoning for Expressive Description Logics. J. of Automated Reasoning, 2007. To appear.
.pdf

Boris Motik, Rob Shearer, and Ian Horrocks. Optimizing the Nominal Introduction Rule in (Hyper)Tableau Calculi. In Proc. of the 2008 Description Logic Workshop (DL 2008), CEUR (http://ceur-ws.org/), 2008. .pdf

Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL Envelope. In Proc. of the 19th Joint Int. Conf. on Artificial Intelligence (IJCAI 2005), 2005.

 

Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL Envelope Further. In Proc. of the Washington DC workshop on OWL: Experiences and Directions (OWLED08DC), 2008.

 

Matthew Horridge, Nick Drummond, John Goodwin, Alan Rector, Robert Stevens, Hai H. Wang. The Manchester OWL Syntax. OWL Experiences and Directions Workshop, 2006.

Information on RDF and SPARQL is available at the W3C link that I provided above.

Datalog implementation usually is based on a query-rewriting technique called “magic sets.”  SQL3 incorporates recursive queries using Datalog-based ideas. 

 

9 Ontology development

Vocabulary + Hierarchical Structure Taxonomy

Taxonomy + Lexical Relationships  Thesaurus

Taxonomy + Relationships, Constraints, and Rules  Ontology

The structure, richness, and diversity of relationships that are typically expressed in an ontology are formalized in several ways.  First, the language for the expression of those relationships is made rigorous.  Differences in types of relationships are made explicit – for example, “X is a part of Y” may mean that X is a component (as in the example above), an ingredient (as in flour in a cake), a member (as in a person in a club), or other partonomic type.

Second, different qualifications on what may be expressed are formalized.  For example, you may wish to say that “Joe has 3 daughters” without necessarily listing all (or any) of the daughters explicitly.

Third, we distinguish between those assertions that are both necessary and sufficient to fully define a relationship and those assertions which are only necessary.  This corresponds to those things, which may be defined exactly, and those, which cannot and must have some additional qualitative verification.

Ontologies are formalized and exist because they enable knowledge to be redacted in ways that are more expressive or more natural for understanding than other formalisms such as relational datasets or logic-based rules.  Ontologies exist and are in development for many domains. In fact, though, the power of an ontology is typically unlocked by some automatic reasoning engine based on a Description Logic. 

Two papers related to the development of ontologies that are useful references are the following:

Natalya F. Noy and Deborah L. McGuinness
Ontology Development 101: A Guide to Creating Your First Ontology
http://protege.stanford.edu/publications/ontology_development/ontology101.pdf

Paul Buitelaar, Philipp Cimiano, Marko Grobelnik, Michael Sintek
Ontology Learning from Text
http://www.aifb.uni-karlsruhe.de/WBS/pci/OL_Tutorial_ECML_PKDD_05/ECML-OntologyLearningTutorial-20050923-2.pdf