Text Language List

Average Reviews:

(More customer reviews)I am a big fan of the authors 1999 book on Statistical Natural Language Processing, and I and was thrilled when I found this new book online -- just search for "Information Retrieval" on Google.
In these two books, they describe the theory behind a vast toolbox which can be used to construct new tools/products for the Internet. Now I can go back to them when the need arises.
For starters, I appreciate the detailed theoretical explanations of topics that I could not find in other texts, and the references to related work are especially helpful. One of the other books I read was Information Retrieval by Grossman, which is an older book but has a more condensed style compared to this. Grossman's discussion of clustering was more high level and referenced a few more papers that I found useful. That helped increase my interest to read through these chapters in which offer greater detail.
Before I felt like I could place each topic in its appropriate context, I had to spend six months of reading both the books, playing with code and finding s/w packages, searching the research literature, reading papers and other books, and then cycling back to the books. Here's are some suggestions for things I'd like to see:
1. A set of recomended programming tools: in some books on Perl -- such as the chapter "Natural Language Tools" in pages 149-171 in "Advanced Perl Programming" by Simon Cozens (O'Reilly) -- you get a very "quick & dirty" introduction to maybe 20-30% of the concepts in these two books along with ways to implement and play around with them. Although Perl has many natural language processing tools, the Cozens book cuts to the chase, explains which are the best tools, and shows you how to use them. I think knowing such shortcuts aids in learning how to apply and improve on them. The more complex and sophisticated topics, the more likely to make it out into the real world if they are easy to play with.
2. More data/examples on what does/doesn't work with end-users: Numbers, graphs, and charts are all good stuff. I always appreciate it when the authors referenced quantitative comparisons, real-world products, and history of Internet. One of the reasons I had to consult the research literature was to broaden my understanding of quantitative comparisons between different techniques involving end-users, which were typically done in the context of complete systems studies that users could try out.
Thanks,
-Sri

Click Here to see more reviews about: Introduction to Information Retrieval

Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.

Click here for more information about Introduction to Information Retrieval

Foundations of Statistical Natural Language Processing

Average Reviews:

(More customer reviews)This is the best book I've ever read on computational linguistics. It should be ideal for both linguists who want to learn about statistical language processing and those building language applications who want to learn about linguistics. This book isn't even published and it's now my most highly used reference book, joining gems such as Cormen, Leiserson and Rivest's algorithm book, Quirk et al.'s English Grammar, and Andrew Gelman's Bayesian statistics book (three excellent companions to this book, by the way).
The book is written more like a computer science or math book in that it starts absolutely from scratch, but moves quickly and assumes a sophisticated reader. The first one hundred or so pages provide background in probability, information theory and linguistics.
This book covers (almost) every current trend in NLP from a statistical perspective: syntactic tagging, sense disambiguation, parsing, information retrieval, lexical subcategorization, Hidden Markov Models, and probabilistic context-free grammars. It also covers machine translation and information retrieval in later chapters.
It covers all the statistical techniques used in NLP from Bayes' law through to maximum entropy modeling, clustering: nearest neighbors and decision trees, and much more.
What you won't find is information on applications to higher-level discourse and dialogue phenomena like pronoun resolution or speech act classification.

Click Here to see more reviews about: Foundations of Statistical Natural Language Processing

Statistical approaches to processing natural language text have becomedominant in recent years. This foundational text is the first comprehensiveintroduction to statistical natural language processing (NLP) to appear. The bookcontains all the theory and algorithms needed for building NLP tools. It providesbroad but rigorous coverage of mathematical and linguistic foundations, as well asdetailed discussion of statistical methods, allowing students and researchers toconstruct their own implementations. The book covers collocation finding, word sensedisambiguation, probabilistic parsing, information retrieval, and otherapplications.

Click here for more information about Foundations of Statistical Natural Language Processing

Text Language List

Introduction to Information Retrieval Review

Foundations of Statistical Natural Language Processing Review

Blog Archive