banner
conanxin

conanxin

SUPARS: Intelligent Librarian

In a sunny autumn day in 1970, hundreds of students and teachers from Syracuse University took turns sitting in front of a printer terminal (similar to an electric typewriter) connected to an IBM 360 mainframe on the campus in New York State. Almost no one had ever used a computer, let alone a computer-based information retrieval system. Their hands trembled as they touched the keyboard; later, several people reported being afraid of breaking the entire system while inputting.

The participants were conducting their first online search, entering carefully selected words to find relevant psychology abstracts in a brand-new database. They entered only one keyword or instruction per line, such as "motivation" on the first line, "respect" on the second line, and "L1 and L2" on the third line, in order to search for papers containing these terms. After running the query, the terminal generated an output indicating how many documents matched each search condition; users could narrow or expand the search scope and generate a list of article citations. Many participants laughed when they saw the results returned by the remote computer.

As part of a follow-up telephone survey, participants were asked to provide two or three words to describe their experience. Out of a total of 78 words provided, 21 were the same adjective: "frustrating." Participants encountered difficulties logging into the system, experienced unpredictable failures and "irrelevant output," and most importantly, didn't know "what words to use in the search." However, they also found the system interesting and exciting ("fun," "I like computers"), with 94% saying they would use SUPARS (the Syracuse University Psychological Abstracts Retrieval Service) again if it became available. Several people proposed continuing the experiment and requested funding from their departments for the project.

The majority of these academic subjects were graduate students in education, psychology, and library science, and they were part of a radical online search experiment conducted by the School of Library Science at Syracuse University. SUPARS was one of many ambitious information retrieval research projects conducted on American university campuses from the late 1960s to the mid-1970s. Several factors contributed to the surge in this research. The development of computer processing speed and storage capacity allowed academic databases and catalogs to be digitized and moved onto online platforms. Computer terminals were new modular devices that could be distributed throughout campuses for decentralized access to mainframes. Additionally, funding for computer-based research from the military and industrial sectors was more abundant than ever before. With this opportunity, academic librarians took advantage of this expensive new technology for exploration. In turn, universities provided a non-classified environment for collaboration with technology companies and military organizations; SUPARS was sponsored by the Rome Air Development Center of the United States Air Force Laboratory.

It is easy to understand why librarians in the 1970s embarked on the search revolution. The scale of academic work was expanding, and soon there would not be enough human librarians to support all the work. However, researchers needed to face a time-consuming, labor-intensive process that required the intervention of librarians to obtain the desired information. While academic researchers could browse new journals in their field, a comprehensive search to find all previous content still required consulting a reference librarian to search through multi-volume handbooks for the correct Library of Congress subject heading. With a set of subject headings, researchers could search the library catalog, search citation indexes in journal articles, including subscription databases like Science Citation Index, and bibliographies manually created by subject librarians at their university. Finally, they would physically locate the correct books and bound journals, including materials they deemed potentially relevant—if those books happened to be on the library shelves.

It's no wonder that SUPARS participants found the system remarkable, despite its limitations. Given the familiarity of academic librarians with the challenges of searching, it made sense for them to design a system that bypassed subject headings and citation indexes. What is even more surprising is that of all the online search experiments happening during this period (including commercial search systems like Lockheed's Dialog, which later became a corporate product), SUPARS came closest to mimicking contemporary web search, foreshadowing several key features of the search protocols we rely on more than 50 years later.

SUPARS and several other nearly forgotten systems were precursors to the modern search engines we have today. While the popular history of the internet celebrates the programmers of Silicon Valley—and sometimes former US Vice President Al Gore—many of the initial concepts of search came from library scientists who were concerned with the accessibility of documents in time and space. With funding from the military and industry, their advancements are visible throughout the current field of online information—from general methods for acquiring and indexing full-text documents to complex algorithms for searching free text and utilizing algorithms from previous searches conducted by others, the building blocks of contemporary query expansion and autocomplete. In fact, these methods and many others developed by campus pioneers are still used by billion-dollar web search and commercial library databases, from Google to WorldCat.

SUPARS was designed by a librarian named Pauline Atherton (now Pauline Atherton Cochrane). In 1960, at the age of 30, she began her library career as a cross-reference editor for the revised edition of the World Book Encyclopedia, ensuring comprehensive and accurate cross-links between different entries. By 1966, she was working at the Syracuse University Library and School of Library Science, and in 1968, she demonstrated the first use of online decimal classification files to aid searching. That same year, she established the first computer-based instructional laboratory at the School of Library Science (LEEP), integrating online searching into regular classroom instruction. (In the pre-internet world, "online" meant establishing networked real-time connections between large computers and other remote devices, such as terminals.)

The following year, in 1969, Atherton and her collaborator, another library science professor at Syracuse University named Jeffrey Katzer, designed SUPARS. The primary goal of the SUPARS project was to provide large-scale online searching to learn as much as possible about how users searched online, their experiences with online searching, and what they needed to search better. To do this, the team built a searchable corpus of academic content for use by the entire campus; over 35,000 articles from Psychological Abstracts by the American Psychological Association. This was indexed and retrievable in the SUPARS system, making it the first large-scale database available online in a non-classified environment. The user base and searchable content were both significant at the time, although far smaller in scale and scope compared to today's web search.

Two decisions made by Atherton and her team made SUPARS truly innovative. First, they removed all subject headings from the Psychological Abstracts entries, allowing all words to be directly searchable, except for connecting words like "and" and articles like "a" or "the." This made SUPARS the first system to allow online searching and output of large amounts of free text. (Their final report was titled "Free Text Retrieval Evaluation.") Second, they saved each SUPARS search in a parallel database that could be queried alongside the abstract itself, making SUPARS the first experiment to allow users to access and use previous searches to find alternative terms or approaches.

Each of these features alone was novel, but to understand how ahead of its time this combination was, it is necessary to look at how contemporary web search services operate today. Search engines like Google and Bing index web pages using two main components: crawlers search for new pages and periodically re-crawl pages already found; parsers analyze the content of the pages and store the resulting information (including all free text) in an internal database. When a user enters a search query, Google attempts to match the words and phrases in the query with the pages in its database and provide the most relevant results to the user.

In addition to the words entered by the searcher themselves, modern web search algorithms also consider other words closely related to those in the search query, including synonyms (e.g., searching for "bike" returns results for "bicycle" and "cycle") and other directly related words.

Most search engines also include words from queries performed by others as part of their internal synonym library, adding search terms to the user's query. This process of including related words is called query expansion and can significantly improve the relevance of returned records. Similarly, Google and other search engines also suggest additional search terms to users through autocomplete, creating predictions based on previous searches to help users complete their queries quickly.

Therefore, by giving users the ability to search free text directly within documents and allowing searchers to draw on previous search strategies, SUPARS foreshadowed the arrival of web search. Additionally, SUPARS analyzed the utility of each individual search by examining its transaction logs. After the initial pilot project, two SUPARS tests were conducted between October and December 1970 (SUPARS I) and November and December 1971 (SUPARS II). Atherton's research team concluded that free text searching was an effective way to improve the relevance of search results (in scientific terms, "recall") and could be as effective as searches led by human librarians. Importantly, an evolving vocabulary system continually adapted to human input and behavior, upgrading from a fixed, "one-size-fits-all" controlled vocabulary system. The SUPARS team did not know that artificial intelligence network search algorithms would accomplish this precise task decades later, but they had a sense that it would be a new and effective way to continuously update search results.

In a letter to the editor of the Journal of the American Society for Information Science in 1972, Katzer described the rationale behind the database of all previous search queries:

The purpose of this search database is to help users formulate queries in the document database (Psychological Abstracts). Since SUPARS currently uses an unrestricted vocabulary, the output of the search database can help users discover other approaches to attacking their subject in the document database: it will provide the keywords used by other subject experts and a representation of their thought processes... We believe this is the beginning of an area that has not been fully explored: the use of user intelligence to enhance machine intelligence.

It is easy to portray Atherton's team as utopian futurists, but the design of the SUPARS experiment was not guided by a prescriptive vision like the open web. It was specifically designed for a future where librarians would be increasingly unable to provide personal assistance to researchers. Expanding the collective wisdom of others was a practical solution rather than an idealistic one.

The SUPARS team observed that due to the new computer terminal locations at Syracuse University being "remote from any human expert in the subject area of interest to the user," additional sources of help would be needed, which could be found in the "human wisdom of all other users of the system." They wrote that the collective decisions of other researchers were merely substitutes for expert librarians:

Ideally, the user would be able to talk with someone knowledgeable in his area of interest and receive various vocabulary and other cues. Then the user could develop or formulate search queries to the system that would maximize specificity or exhaustiveness of retrieval as desired.

As they used modular terminals on campus, the SUPARS team saw the impending future and what a world based on distributed, networked computing would lose: more and more researchers would work independently outside the library, where librarians could not provide assistance. Atherton's team did not predict a world without professional librarians; they were preparing for a world where research would take place in many different locations, too far from the reference desk for them to provide help.

The SUPARS experiment also concluded that while leveraging others' search terms was a promising alternative for subject-based searching, it did have limitations. One of the final recommendations from SUPARS was to continue developing controlled vocabularies, stating that "there is still a need for some form of user vocabulary or synonym control for interactive free-text searching." They reached this conclusion after seeing that participants in SUPARS often encountered vocabulary issues, such as in their example of searching for "people" instead of "human" and not getting any results. Participants themselves also overlooked the comprehensiveness of subject headings. In fact, as part of the SUPARS survey, they were asked whether they preferred a free-text system or a system with more controlled vocabulary: 42% preferred the free-text system, 36% preferred controlled vocabulary, and 12% wanted a combination of both.

In this way, the significance of SUPARS lies in it being both a design far ahead of its time and a counterexample to the established technological utopian history of the internet and the World Wide Web. In this history, celebrated figures like J. C. R. Licklider, whose idea of the Intergalactic Network directly inspired the invention of ARPANET, often called the "first internet," are seen as visionaries. (Licklider was also deeply involved in similar campus online search experiments of the 1960s and 1970s, funding and advising several studies at the MIT Libraries that took place around the same time as SUPARS.)

In 1968, the year before the design of SUPARS, Licklider's paper "The Computer as a Communication Device" proclaimed, "In a few years, men will be able to communicate more effectively through a machine than face to face" and described a beneficial, happy society mediated by human-computer interaction. Licklider predicted that "the online personal life will be enriched," and "communication will be more effective and productive, and therefore more enjoyable." Licklider's article is typical of the optimistic and predictive writing of this kind of information technology potential in the futurist genre.

While cultural praise is given to visionaries like Licklider, the same praise should be given to Atherton and the SUPARS research team for seeing what the future might lose and designing for it. By expanding our cast of internet dreamers to include people like Atherton, we see a more nuanced picture of how different types of researchers envisioned the future world. Licklider saw what we would gain from being able to communicate with anyone in the world online, while Atherton's team saw what we would lose in expert intermediaries; they designed for this cost.

In 2022 and 2023, as the first wave of generative AI search engines (including academic search engines like Elicit and Consensus) are introduced to a wider audience, both excitement and skepticism arise, and it is equally useful to analyze what researchers will lose by relying on these tools. For example, when we can simply enter a research question to create an instant literature review, it is not just a huge positive leap forward. This new technology will cause a loss of context and background, even when incredible new discoveries are made—the loss Atherton saw is different but equally intangible and far-reaching. Being able to anticipate these consequences and actively consider how to help researchers overcome them, rather than mourning them like Luddites, is a lesson we can learn from the SUPARS team.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.