![]() |
| NYU Bobst |
Before 2006 and still going on to this day, one would activate interlibrary loan to locate a book in a library somewhere in the continental USA and have the library mail a copy. The book would be lent to the patron as though it had come from the local stacks. Libraries with chronically under funded acquisition budgets and consequently poor collections could still support advanced research. At UNC, NC State and Duke, there has always been an active interchange of printed materials via courier, improving the range of research dramatically. In that case it was a matter of either driving 20 miles or waiting a day or two or three. This still goes on today.
Six years ago, when large scale scanning was still a nutty idea, one might have heard: "Be happy, the book will be here in a week." Tell that to a researcher in the medical sciences. Today, with 12 million books on line and growing, waiting a week seems on its way to becoming an absurd idea.
There have been harbingers. In the humanities, long ago, 2 decades maybe, libraries used to mail xerox copies of articles, then they would fax the Xeroxed copies, then they put the runs of journals in storage and signed up with JSTOR or MUSE or other systems that deliver pdf's. Legal research is largely electronic. Research in the medical literature and related areas is largely electronic.
What sort of materials are at issue here? Leaving aside the disciplines that have built their electronic text bases and citation indexes for the last several decades - what are these 12 million books scanned from the shelves of libraries? What are their uses and why is it important to get at them quickly? This is a difficult question with many answers depending on one's field and specific research problem.
To give a general answer, I shall try to bend the answer toward the legal questions at issue here. First we are talking about major research libraries, Stanford, Michigan, NYPL et al. We are not talking about local public libraries with collections of 10,000 books where aunt Betty gets her novels which she passes on to cousin Luther. One caveat: local libraries would be one of the great beneficiaries of the "great e-library;" local librarian would be able to steer their patron to texts in the scanned corpus. We are most emphatically not talking about popular literature, romance or detective novels, self-help books, or this and that for dummies. We are talking about depositories of research materials that have been built up on campuses over the last two hundred years.
A Historical Whirlwind Tour of the "Campus library"
It is instructive to trace the evolution of the libraries on campus. At UNC, for example, in the 19c. the central library was not as important as the libraries of campus societies and departments and professors. The first UNC library is now a small theater that has 10 windows and seats not more than 200 people. The UNC library had 9,000 vols. in 1882. The library grew to 40,000 in 1895 but only one seventh was accessible. Ironically, the Carnegie "Corporation" (Google: Carnegie, ogre for the historically correct) was instrumental in forming central libraries; this was the case at UNC.
After several more moves of the central library, an edifice of consolidation was built, Wilson Library. At its opening in 1930, Wilson held 22,000 volumes and 12,000 pamphlets. After the war, several stack additions were built. When I was the "stack supervisor" in Wilson in the late 60's, space was becoming a serious problem due to the great influx of the new "Library of Congress Classifications." Many books were taken to storage spaces all over campus, some to a dirt floor crawl spaces under the new law school building. That was pretty much the end for hundreds of volumes of French parliamentary debates that already had been oxidizing to a nice brown teint in the Wilson stacks.
In the 80's Wilson became the home of special collections and a new 10 story library tower was built some distance away. Every campus has its own version of how the current library evolved. In the 1980's it made perfect sense to spend many millions to build towers to house physical books.
In the case of UNC, the dark cramped stacks of Wilson were replaced by bright slightly less cramped stacks. At Duke and NC State, the new stack towers were joined to the current libraries. Needless to say, the expansion to towers was only a temporary solution. Now the towers are bursting at the seams. This is especially troublesome at institution such as JHU and Princeton where building codes forbid building up. It is difficult to dig down further levels into the bedrock of Baltimore or into the marshes of Princeton.
The books in research collections such as Stanford and Michigan and many others, but not that many others, are pretty much everything that has been published since the beginning of the 20 century but going well into the 19c. and further back. Much of the really old stuff was and is still carried on film, although the migration to the digital is complete in many areas and available through institutional subscription.
We have to differentiate between original art of whatever genre on one hand and commentary, read: scholarship on the other. These two categories are generally shelved together: the primary works first, followed by the secondary works, i.e. the commentary. In the first category, for example, there will be all the novels of Dickens in 20 different editions; there will be 20 copies of Oliver Twist and every other thing by Dickens ever published. Of course there are other authors in other languages. There are less well known authors, authors of only local fame, authors that had been of interest to local researchers who built up the collection. The primary works of Dickens are not a problem; they are solidly in the public domain. The same shelving system exists for the copyrighted works, for Steinbeck, Hemingway, and J.D. Salinger.
These collections of primary works are some wheat and much chaff. Nestled among the many edition of Dickens there may be one that is considered currently the one suited for scholarly work. The considerable rest are merely documents reflecting the tenor of their time. Many of the 12 million scans are of that nature.
We also have to differentiate reading "Oliver Twist" for the first time vs. "working" on "Oliver Twist." To read much of the canon of literature in English, but also in other European languages - it is not really necessary to go to a library. This has been true for about 2 decades. Project Gutenberg as well as web sites with specific interests have the primary text of about everything one could want to read or wish one had read. No figure of stature is missing although there might be room to quibble. Most of the canon - the texts of most interest to scholars are available in Google Book Search.
Of course, to appreciate the dimension, one has to glance at the subject divisions of the Dewey and Library of Congress system of classification. Or one can walk through the stack tower of a great library. At first one is overwhelmed, but after working with the collection for a relatively short while one realizes that it is a collection of sub-sets and these subsets do not really mix in any meaningful way, only peripherally and somewhat arbitrarily as some literature professor happens on a text in economics, for example. One does not really talk about that. Each field has its own primary texts and its tradition of commentary. Thus competence in a field may be limited to acquaintance with a few hundred, a few thousand volumes at the absolute outer limits of the possible.
In areas such as literature, philosophy and history, text traditions reach far back into antiquity. Since the Renaissance, 1500 on, that tradition has been edited and commented. By 1923, the magic date for the public domain, just about everything one could think of up to that date had been discussed several times.
The discussion has continued and new primary text are added as well - new novelists, new poets, new dramatist, new philosophers - the ongoing intellectual work of the last 90 years. The discussion has become problematical on the commentary side. One talks of an "explosion of knowledge," as the quantity of printed material has grown exponentially. Only recently has the growth of electronic materials overtaken the printed. Monographs and essays on Shakespeare to pick one of hundreds examples continue to be published every year. The result is that today a collection of 5 million books is considered large. A collection under one million is considered anemic. A collection of 12 million books is stellar beyond imagining. Yet once there are 12 million books on line, on could imagine by the time Judge Chin and his peers are done with this issue, the total will be 20 million, 40 million, worst case.
Is Scanning Stealing?
To delve into the legal, pseudo-legal in my case, I would like to differentiate works that are worth paying for to read and own and works where the author should be happy to be read at all, ever, and ecstatic if actually quoted. Steinbeck you pay for, period. The heirs of Steinbeck can negotiate what the market will bear. The fact that Steinbeck texts do not show up in the indexes is sad; it reflects that the people who collect the money are not interested in modern scholarly methodologies. They are unconcerned that not being in the comprehensive index will hurt their sales, a myopic attitude.
However, some monograph on "Canned Goods on the West Coast as Reflected in the Novels of John Steinbeck" published by the Monterrey Press in 1985 are not worth paying for. The book was a reworked dissertation 1983, American Literature, UC Ukikah. If you are working on a dissertation on Steinbeck you probably have to read this book. If your academic career touches Steinbeck in any way you may well have to read it as well or be ready to say why it could be ignored.
There are millions of books like that. They were published to meet an academic obligation and thus have became part of the discussion. The remuneration for writing a book may have come from several sources: a research grant, paid time off to do research and write, subsidy to edit, bind and publish and, not to forget, advancement and pay raises for having published, invitations to lecture or to write spin-offs. The books never actually "sold" except to libraries with a subscription from the publisher and to the occasional Steinbeck aficionado. These authors should write a thank you not to every single library that has spent scarce money to acquire these professional obligations. To demand money from libraries to deliver electronic copies is a bit cheeky. To get outraged because your "Canned Goods Monograph" shows up in Google Books is a pathetic overestimation of self-worth.
There is a question that Judge Chin might have asked himself: "What is the economic value of the books of authors who have not opted out for obvious economic reasons." No one can expect the heirs of Steinbeck to accept 60 dollars for "Cannery Row." But the author of the "Cannery" monograph has essentially forfeited additional economic rights for lack of success in the market of book sales. The libraries may have spent 25 dollars to buy the book. I would be happy to pay three, two or less dollars for a pdf were I working on a book on Steinbeck. I might try to argue the price down to 5 dollars in a used book store. I think a dollar to the author/heirs would be extravagant. I would pay a quarter at a yard sale.
Larceny
Let us do say, I am working on Steinbeck. I have an obsessive personality and have set my sights on collecting Steinbeck commentary. I go to the library, take the book from the shelf and head to the xerox room - 230 pages, 5 cent per double page, 5.75 dollars to copy the book, time 20 minutes. Have I committed a crime? Let us say I don't use an institutional machine but my personal scanner? Lets say I mail the scans to a friend out of state, is that a federal case?
One of the antiquated conceptual structures that surrounds the proceeding is the concept of "thief." The proceeding is colored by the subtext that thievery, or larceny on a grand scale is going on with the scanting the collections of our great research libraries. "Thievery" is easy to grasp, and experience with the concept is widespread. Stealing must be punished, period, except when stealing bread to feed the starving family - and - except when stealing from small investors to send the kids to Lawrenceville and Amherst.
It took some time for intellectual property to attain the same status as "real" property. In the last 300 years we have come to accept that intellectual property can be stolen and laws to sanction that theft must be on the books. The theft of intellectual property is a murky problem, not just in the sense that the law is murky by definition. It is one thing to reprint a best-seller surreptitiously and sell it while it still sells best. It is another thing to steal an idea or a thought and to write something based on the "stolen thought" and to publish and sell it. Even if I copy paragraphs verbatim without attribution in my history of some unnamed presidents of the US - I will be found guilty of sloppy research, easy to do in a world of assistants and cut and paste. It is yet another thing entirely for a library to make electronic versions of its out of print holdings to facilitate delivery. The first infraction is obviously a matter of criminal prosecution. The second type of infraction of property rights is generally subject to sanctions in academic circles that frown on "plagiarism" as unprofessional and unworthy of active academics. The third is merely the efficient use of research materials, no harm, no foul.
A difference of the first two types of infraction is that the first generally deals with real things - things in the market, for sale, making money. The second infraction concerns intellectual property of great academic value - but of no commercial value. If I rewrite Plato's "Parable of the Cave" as though I had come up with the story myself, I will be judged an idiot, thrown out of the university although no one sustained economic loss. The value that adheres to published ideas is not monetary. A similar argument can be made for out-of-print library books. It is clearly understood that books published before 1923 have no economic value although a small percentage have an immense intellectual value. Reprints still sell well, and only the introduction to the reprint is protected. The text used to reprint was simply taken, appropriated quite legally. However, copying sentences of Hegel without attribution is an intellectual infraction and is not punishable as "theft" without extremely creative interpretation of fraud statutes.
There is a mass of printed material located on the spectrum between on one hand, "books on the market" that are in print and are producing revenue and on the other, "books on library shelves" since 1923 and still in copyright. In that second area one should be careful not to start yelling "thief" indiscriminately. On the one hand there is a recognition that education supercedes property rights in many instances. That demonstrates that there are interests beyond the one of: "I wrote it, I own it, and my children will own it into the 3rd generation, now bugger off." The dearth of renewals pre-1964 make the latter statistically unlikely.
Certainly libraries have a right to buy books, put them into their catalog and lend the books to the public. The legal barriers for challenging that practice are too high even for the most creative practitioners of the law. [Note: Not that creativity is not in play. Publishers of e-books are suggesting that libraries who lend e-books be required to buy a new copy after 27 lending transactions; that is the average time it takes for a printed book to wear out and have to be replaced.]
I for one hope this proceeding does not drag out for much longer; I am becoming pretty impatient. At the very least I would like to see the current delivery of snippets expanded to larger snippets, encompassing runs of pages (as a minimum, adding the page before and page after the page with 3 line snippet and displaying all three - also "Table of Contents," bibliography and Index). This is done to some degree in the active, revenue producing market for in-print books; that seems reasonable although barely adequate when having to consult a book that cannot be bought. Delivering three lines is not reasonable, trending toward the absurd, and looking at only three lines on a page is not adequate, it is firmly in the realm of useless. I am sure this is wrinkle in reasonable and adequate not envisioned by Rule 23(b).
