Credits
This work was supported in part by the Center for Intelligent Information Retrieval
and in part by NSF grant #IIS-0910884 (Mining a Million Scanned Books).
Principals
James Allan is a Professor in the Department of Computer Science at the University of Massachusetts Amherst (UMass). He co-directs the Center for Intelligent Information Retrieval (CIIR), the well-known research organization in Information Retrieval and related fields. Allan’s research focuses on interactive information retrieval and organization, including retrieval models, browsing and other human-computer interactions, topic detection and tracking, automatic information organization, and evaluation of information systems. Allan has served on the editorial board of two major journals in the field and was recently elected Chair of the ACM SIGIR organization. He is the author of more than 100 refereed conference and journal publications in the field of IR and has chaired the PhD committees of 14 graduated students in the area. From 2002-6, Allan was involved in the NSF-funded National Science Digital Library (NSDL) initiative as the PI of the UMass work on constructing the NSDL’s first search engine.
R. Manmatha is a Research Associate Professor in the Department of Computer Science at the University of Massachusetts, Amherst. He does research on Document Analysis and Recognition for printed and handwritten documents and on image and video retrieval. His students and he created the first automatic demonstration retrieval system for historical handwritten manuscripts—specifically, for a portion of George Washington’s manuscripts. He has published over 60 refereed papers and has been on program committees for many conferences in the areas of information retrieval, document analysis and recognition, and information retrieval. He is an associate editor for IEEE Transactions on PAMI and for Pattern Recognition Letters and was previously an associate editor of ACM TOIS. He spent a summer as a visiting scientist at Google working on various aspects of their books project. He also co-founded a mobile image search company SnapTell which was acquired by A9/Amazon. SnapTell’s technology allows people to take pictures of book, CD, DVD, and video-game covers with a cell phone and have them automatically recognized. He is currently a consultant to A9/Amazon.
David Smith is a Research Assistant Professor in the Department of Computer Science at the University of Massachusetts Amherst. His research has centered on core natural language processing tasks such as morphological and syntactic analysis and on applications of NLP techniques in machine translation and information retrieval. His research in machine learning has focused on efficient inference algorithms and semi-supervised learning. In addition, he has published papers on information extraction and digital libraries, including the Best Paper award at JCDL 2001. He has taught courses on natural language processing; on large-scale text processing and grid computing; and on research methods for computer science. He received a Ph.D. in Computer Science from Johns Hopkins University in 2010 and received an A.B. in Classics (Greek) from Harvard University. In the interim, he was the head programmer for the Perseus Digital Library Project at Tufts University.
Getting Things Done
Jeff Dalton
Logan Giorda
Kriste Krstovski
Xiaoye “Tiger” Wu
Megabooks Alumni
Niranjan Balasubramanian
Minh Nguyen
Siyuan Peng
Xing Yi
Mao Zhao
Collaborators
Gregory Crane, Perseus Digital Library Project, Tufts University
Brewster Kahle, Internet Archive
We would also like to thank Bruce Croft, Sam Huston, and the Galago and Lemur Project teams.