Wednesday, August 17, 2011

Spinning the Background - Text (continued...)

Alexandria
Rev. 3/12/2013
The previous question - largely rhetorical - was: Should we be thankful that the opinion could be read to be only marginally negative?

Personally, I am not thankful. I thumb through the short pages of the ruling, and my eye cannot fix on anything that might convince me that this project is not "ready to go." I read the Grimmelmann analysis and can understand how the professor of the law explains and in a sense justifies Judge Chin's thinking. Of course he has a completely respectful reading based on his recognition that what Judge Chin says, whether he may agree or not, goes. Even if his students have better tactics in dealing with the case; ruling is ruling, for now.

Such an attitude does not help the rest of us who generally do not have a respectful attitude towards texts. If humanists look at the text of a great artist of the past, it is fashionable to devalue the temporal horizon and belittle the text. In truth, the texts are often belittled without awareness of a temporal horizon from perspectives naively secure in the unreflected contemporary horizon. For contemporary texts, a little ideological analysis is usually enough to send the text to the round file - or to find birds of a feather.

With this text, respectful is not easy. Of course, respect for the person of the judge and the office of judgement is an unquestioned given, but judges must be generalists and a reading of the law may not always be helpful for generalist to make good calls given the contentious to and fro. As a generalist a judge must be ready for anything: one day it is the professional wrestlers who feel aggrieved by the lack of respect given them by parents' associations that do no appreciate the subtle beauty of the sport WWW; the next day, here comes Al Franken, pursued by all the people he has called big fat liars more than once  already in the title of his book. Then the judge has to decide for how many decades past the maximum plausible life-expectancy he should sentence the 70 year-old Bernie Madoff; the correct answer is 14. Rulings on projects of vast social-historical implications, 12 million electronic library books, do not fit easily into the usual pattern of diversity of the cases arriving at USDC.

At least in this text - thin as it is - the temptation arises to read sentences and apply an ideological critique. Yet, a legal perspective is not an ideology in the usual sense of the word. It is an ideology, to be sure, but not one to be criticized when reading a legal opinion. Given the judge is impartial and fair. Yet the judge must rule one way or the other. Thus it will be inevitable that a ruling, lets say, in favor of "one way" over "the other" must needs by definition slant towards "one way," with all fairness, reasonableness and adequacy. Inevitably, the spin of the winning side will find its way into the ruling.

My limited experience with legal texts, but based on my small sampling effort, for lack of a better word, seems to point to a relative simple vocabulary. Sentences written a hundred years ago and a vocabulary from our agrarian and industrial past still carry their meanings into completely new contexts.

In addition, the judge must approach entities such as "library" and "book" with the view of a generalist at home equally well in 1911 or 2011.

Yet, the fact is that the relation of printed text to electronic text has ballooned to 1 to 20. For every printed page produced, there are 20 electronic pages produced.

This imbalance also affects the consumption of texts. The consumption of printed to electronic is more likely 1 to 100, arguably. Lawyers and judges participate in this electronic revolution just like everyone else, but when it comes to the norms of their stock in trade: setting the rules through rulings, they are still stuck with 19th c. meaning affixed to the words "library" and "book."

I am going to attempt to update Judge Chin's ruling taking into account that the words he uses may no longer have the meanings he associates and uses to decide what is reasonable, adequate and fair. In any case, his reasoning is determined by the words he uses as the "arguments" or "terms" for his logical propositions; these words move him back in time. Yet the future approaches on a daily basis, and nowhere is this seen clearer than in libraries and universities. The fact that a bunch of graduate students recently finished with their degrees could set up a billion dollar company in less time than it took them to get their PhD's shows that the lines between the university and business has blurred. The students took their research, built a company, continued with their research and started earning billions. Are there terms and arguments in the judge's vocabulary that deal with this state of affairs adequate? Is the comparison of Google with Standard Oil instructive or helpful? I am speaking hypothetically and not suggesting that either Judge Chin or the Justice Department have said or intimated such a comparison. Yet they are concerned with monopoly and can only go to their own history. Who can know what actually lurks behind "concern?"

It is easy to see what thoughts stand behind the demand that Google must get certification from the Justice Department that is is not a monopoly. Only Microsoft, which has MONOPOLY tattooed on every single one of its parts could pursue such a strategy. They know from experience that the Justice Department is capable only of indistinct mumbling, either for or against. Any paper certifying that Google is NOT a monopoly will be sufficiently ambiguous to give the notion that Google IS a monopoly fresh wind. It is like proving that one has dated neither male nor female sailors, honest. In the context of spin, that is all that is required.

Is there enough money involved and are there enough people using out-of-print library books to make an economic case for restraint of trade? At present no money is involved in the library book trade. Obviously, here Microsoft and Google go different ways. In the case of Microsoft, every computer user in the Intel architectures, minor exceptions noted has to buy Windows. Microsoft tries to make the web browser an integral part of Windows. No operating system in the short history of computing has come up with the idea of arguing that a software module working outside the computer is essential to the operating functions inside the computer; the Microsoft argument, absurd on its face (as has been demonstrated conclusively), is that the two functions cannot be separated without compromising the OP. By the time the Justice Department mumbled something, the browser market had been destroyed and innovative developers of browsers buried. Microsoft continues to sell us a bloated and vulnerable operating system to this day.

Incidentally, we are stuck with a bloated OP and a browser that any half-wit that learned to program in Junior HS  in Vladivostok could break into and erase all files remotely. That is a monopoly. Google's market share was not established with dirty tricks. There will always be conflicts of will and litigation; monopoly is something quite different. Building Chrome is not an act of a monopolist, it is a public service. Scanning library books is also a public service.

Should the judge look at the electronic tools in the legal workplace, where the transformation is fairly complete, he would get a better model on how to regulate Google and the libraries than case law on monopolies or on intellectual property. Of course, the universe of legal texts is much smaller than the universe of all books. Yet it is important for judges to extrapolate from their own situation and see how change evolves in other analogous fields. The judge must accompany a "common hairless biped" on he way to check out the single existing copy of that bound set of pages within a radius of 50 miles. The judge must consider the notion: is there a better way? Would I want to work like this? Is there anything on my docket that is relevant to this?

Google is a relatively recent organization that has had a meteoric rise due to the need to deal with the vast quantities of electronic texts - today - and not in a decade. Standard vocabulary on monopolies simply does not hold when it comes to understanding the math and the statistics and the computer optimizations to produce a complete index of everything.

When the organizational chemistry has formed to produce innovation quickly, to expand into dozens of fields quickly, it is not a sign of greed or rapaciousness - but a sign of creativity unleashed - for the benefit of all - not just stockholders. Pronouncements by economists of the past on the duties of corporations to shareholders do not hold when intellectual creativity in math, statistics or digitization are the stock in trade. The object is no longer to corner the market for commodities and manipulate the price - actually, it is not easy to say what the object is today with texts and indexing - it is not monopoly.

Similarly, the rights of copyright holders do not conflict with the rights of readers to read library books. If the majority of texts produced are electronic, it makes no sense to limit library books to ink and paper. At that point it is no longer adequate to harp on copying rights for books that would never be copied had technology not moved beyond ink on bound pages. Scanning technologies were not invented to subvert copyright. Electronic text is a natural evolution - something the courts should embrace - something the courts have embraced in their own work. How about giving text researchers around the world a break.

Rather than projecting an antiquated view of libraries and their function, the judge should think along with developments and pave a secure path into the future rather than route all traffic onto an actual plank road designed for driving ox carts through the rain.

Spinning the Background - Part Two.

Pages 2 to 13 of the opinion deal with the background in three sections. In a text of 46 pages, not counting the list of attorneys appended, and weighing in at barely 23 USDC pages, the "background" represents 24%. The reason I bring this up is that in a ruling so short and of such magnitude to all of us who care about electronic research, each page is precious; it is a precious clue as to what on God's green earth Judge Chin might have been thinking. With the introduction and the background we will have covered 28% of the ruling. Were we talking about 540 pages, I would just say, OK, denied, lets forget about it, the lawyers will sort it out, I got some work waiting with public domain sources from the 19th c. Yet with 11, I mean 5 pages of "background," I feel I have a shot at garnering some kernels for my legally impaired brain.

Were this an English paper on a novel, this section would be the plot summary. It is always interesting to read a student's plot summary, assuming it has not been copied verbatim from somewhere. One can see what was remembered and given significance and what was forgotten or fell below the threshold of awareness. With students we have to look for what was missed, with lawyers cum judges, with all due respect, we have to look for spin.

The narrative of facts is the favorite playground for spin. Getting a specific formulation of a "fact" into the record will increase the likelihood of having that "fact" passed on, uncritically.
 In 2004, Google announced that it had entered into agreements with several major research libraries to digitally copy books and other writings in their collections. [Chin Ruling]
In the first sentence Judge Chin writes that Google entered into an agreement with libraries to "digitally copy books [...]" in library collections.

Were this a chess game, any unreflected first or second move can lead to a swift end playing against a master. It is possible that more reflection may not help either. But there is something called "the Book" in chess. "The Book" contains the sequence of moves taken from the history of chess that represent safe passages towards maintaining a balance or towards certain victory. Computers are so good at chess because they can stay with the book for 10, 15 moves. Once you have left the book you must depend on your tactical prowess. This is fatal if your opponent is still in the book and knows what led to certain victory in the past. But never mind that.

It may be fatal for the position of the digital library to admit in the first sentence that Google started "to digitally copy" books. Aside from the split infinitive which points to the declining standard of written exposition at our smaller Ivy League institutions who fill coveted law clerk positions, the word copy opens a huge legal door. It may well be that the word "copy" has been in the Google paperwork with the libraries from the beginning, but now we have the post-game analysis. One should ask, are there other ways to describe what workers at the library, paid with Google money, did with cameras and books other than to copy books.

Lets try this:
In 2004, Google announced that it had entered into agreement with several major research libraries to explore and develop electronic strategies for delivering bound and cataloged texts and other writings in their collection to library patrons. [Batke rewrite]
Now that already sounds a lot better. Any apoplectic authors can now rail against Google and libraries finding electronic means to serve their patrons better. I am trying to open the door for the concept of the "server" and close the door for the "copier." We want to stay away from the copier because Judge Chin will lead us through his "Book" to "selling copies." We don't want to end up there, mate in 2 years.

This is a minefield situated on a slippery slope. In the next sentence we are started down the slope towards the mines:
Since then, Google has scanned more than 12 million books.
It has delivered digital copies to the participating libraries, created an electronic database of books, and made text available for online searching.
So Google has been painted with "delivering digital copies" and making a "database of books." The question revolves around 12 million books. You ask, these are facts, period. Where is the spin?

Well, listen and be amazed. We read: "Google has scanned..." and "[Google] has delivered digital copies." It certainly could be made to look like it, if it were not actually otherwise. Google gave money to libraries to scan books. Certainly Harvard seems to have taken a very independent course. One could argue that Google merely retains non-exclusive rights as do the libraries for their scans. It is fairly serious spin to say that Google "delivered books" to the libraries. Where was the physical locus of the scanning. Granted the Austrians are sending their half a million books on a three-hour trip up the Autobahn to Munich. The question is: what is the sign on the building with the scanners? I suspect the books scanned at Stanford and Anne Arbor stayed on the premises to be scanned. Certainly I would try to argue that.

So we are confronted a big fat number of 12 million. So we have to activate Jenny Craig. Not all 12 million are in contention. The spin here is that a huge number of books were "illegitimately copied to be sold." We have to subtract the out-of-copyright, books published between 1500 and 1923. That is several million. There are the books given to Google by the Google partners program, probably a sizable chunk. There are the books that reverted to the public domain for lack of renewal. Then there are the books by authors who died before 1941. There are the books that may have been scanned, but have not been delivered, a sizable chunk. Then there are the orphans that are truly orphans, they should not be mentioned as under contention. Then there are the opters out, probably the literary estates of the big stars, the Steinbecks and Hemingways that still produce substantial revenue for heirs. We are well below 12 million.

Here we have some more spin. Judge Chin, in his discussion of the "class" of authors, makes the point that 6800 authors opted out of the agreement. If I were selling many many copies of "Grapes of Wrath" in 40 languages world wide - opting out is the only economically viable choice. There are other considerations that an author might address differently than the managers of an estate. For example, should Grapes of Wrath be in the comprehensive index and should the snippets of query hits be delivered to the screen so researchers can find the passages in their personal copies. However, if my book has made no money since 1963 and only 20 copies are available in libraries and 10 copies in abe.com - opting out is an arbitrary and capricious decision the goes counter to the common good as well as to the personal good and should not be allowed. (Follows the hard line). By publishing the book and by garnering the rewards of academic advancement, the author has entered into an agreement with the scholars in his field that his or her contribution is part of the discussion. If that discussion is going on in the stacks, the book must be in the stacks; if that discussion is going on in electronic indexes world wide linked to pdf's, then that book must be part of the index.

In any case, it is possible to imagine that of the thousands of authors of the 130 million extant unique titles that 6800 have a significant financial interest to opt out. One strategy would be to find a way to include the opters out in the index and have the downloads be bought through the individual authors publishers and not through the libraries' text base. It is as easy to rout money as it is to rout books.

I am not close enough to the proceedings to be come up with how many million we are really talking about, but I could imagine a sentence such as - "only half of the 12 million books that have been scanned by libraries and sent to Google for indexing are contested in litigation." That would not really be counter-spin, more an attempt to give the settlement a life of its own, even with parts of it lying on the scales of Justizia. It would make clear that a large percentage of books are being delivered currently and legally and successfully, that all who do not want their books delivered do not have them delivered and all that is being contended are the authors not heard from who would be part of the 6800 if they have been reprinted or are still making money. It would cement the position of the ASA as an important common good already in place, something conspicuously absent in Judge Chins spin toward denial.

By the time we have unspun the spin we are left with perhaps 1. less than half of the 12 million books scanned by libraries 2. which were scanned to fix the logistics problems of libraries 3. which they delivered to Google for indexing - a thoroughly legal thing to do -and 4. for creating a database and an index for online searching - also legal, for which Google would be the obvious choice. The 6800 opters out that Judge Chin cites are doubtless the only ones who are making money with their books. There may be some who opt out just to be ornery; there may be some who did not opt out who may want to cede their work to the public domain. The USDC has no language to describe actions not motivated by maximizing profit.

Some estates are making serious money and will continue to do so and will begin to lobby should their time window close, as will the copyright holders of cartoon characters. The system of delivering 6 million books not in contention is working well and cannot be shot down as unworkable. Nor can third parties duplicate Google effort in the near term with public money as has been suggested. The only real question is only how much money must be given to the registered owners of essentially worthless copyrights. The mostly bogus question is how much money should be collected by well-paid collectors for the unknown authors of orphans who will never appear, to be distributed eventually somewhere in the author-publisher community - if they have their way - thank you very much, indeed.  Many libraries have created their own systems of delivery, either in-house productions, or service organizations to handle data-flows that would stress the capabilities of Library or IT Centers capabilities to the breaking point.

Ladies and gentlemen, we are still in "the Book" ready for the next move. Yet danger lurks.

There is a troubling footnote to the sentence following the one discussed above which is just a short elaboration and could really have been put in some construction of a dependent clause. The connection is broken by a bibliographical note embedded in the text from early in the litigation. The note is from a 2006 Berkeley Law Journal. I guess it was embedded rather than made a footnote because the generous spacing of lines would have created two line pages and fifteen line footnotes. It is a bit of extra razzle to obscure the thinness of the gruel.

The sentence with the footnote which follows:
Google users can search its "digital library" and view excerpts -- "snippets" -- from books in its digital collection. [1]
We obviously need to maintain the counter-gambit. Google has no "digital library." Google has no "digital collection." Google has received scans of texts that it has indexed because it is the only outfit that can index such vast quantities of text and deliver it around the world.

First, the use of the term "excerpts" is misleading. Excerpting is a scholarly technique to copy text verbatim on cards or pages with key-word labels for future use. Any excerpt will generally cover a complete thought. Any attempt to compare "snippets" with "excerpts" or to treat them as the same thing is disingenuous. Snippets are lines snipped from the text; excerpts are full quotes ready to be used during the writing phase of research. This is either very transparent spin or lack of awareness of fundamental procedures of research  Snippets have been around for a long time in "key-word in context" concordances. Current versions of such systems will always take the user to the full text so a real excerpt can be copied from the text. Nice try but caught with the hands on the spinning top.

Second, the texts delivered to Google, however, are clearly marked as to the library of origin. I could imagine a collection of servers at Stanford, Michigan and NYPL that hold the pdf's scanned in-house. The Google index program would then link a hit to the appropriate server in the appropriate brick and mortar library. That would explode the notion (and the associations that go with it) that Google is building a library and will sell access to books. Even if this is not the actual data-flow, it certainly could be, and if it could be, then the whole notion of "building a library" is simply not up to the tasks of describing what is done today. Data is in the "cloud." What does that mean? It means it is located on servers that may be administered by any number of organization, including the organization that scanned the books to solve their own logistics problems.

So really, the gambit is to paint Google as having taken its non-exclusive rights from its indexing work with libraries to cut a deal with authors and publishers to open a book store. When that thought crossed Robert Darnton's grey matter, he went ballistic.

Yet this is not inevitable that this spin is substantiated. First, only "snippets" ever reach the screen - not even excerpts which would be more useful and also legal; thus only the hard-line copyrighters object. They object not because someone has been hurt, but rather, that the prerogatives of Congress have been circumvented.

Microsoft is concerned that Congress might have been slighted. Give me a break. It is a separation of power argument that is usually fatal if the dosage is large enough. The delivery of "snippets" although it renders the system of delivery practically useless, never-the-less does represent a recognition of the rights of authors in the effort of libraries to find more cost effective ways of delivering text to patrons than the expensive walk-in service they have been carrying for centuries as a favor to authors, a favor authors have become to look at as a right.

So what about this footnote. It is a multi-part footnote affixed to a small sentence:

The term "digital library" apparently first appeared in the 1980s, see Mary Murrell, Digital + Library: Mass Book Digitization as Collective Inquiry, 55 N.Y.L. Sch. L. Rev. 221, 230 (2010), although the notion of a "universal library -- the utopian dream of gathering [] all human knowledge and, especially, all the books ever written in one place" -- has been with us for many centuries, id. at 226; see also id. at 226-36 (detailing that history). It is estimated that there are 174 million unique books. (Clancy Decl. ¶ 11, ECF No. 946). The Republic of Germany reports that certain "European nations have taken affirmative steps to create a European Digital Library ('Europeana') that balances the needs of authors and publishers with those of users in a way that meets the interests of both." (Mem. in Opp'n to ASA of Republic of Germany 2, ECF No. 852 ("Germany Mem.")).
OK, it is clear that libraries have a library, a collection of books digital or otherwise. Although the theory of the "Cloud" had already been formulated in the 80's, not everybody understood what that meant. I personally only had a vague skeptical notion.

Only the Google partners form a Google library; it really is an entry in a list of servers to crawl.

Murrell's 2010 NYU Review article (above) examines the notion of libraries from a historical perspective. So what has Judge Chin chosen to quote:
"utopian dream ... all human knowledge ... all books ever written ... especially those ... gathered in one place."
At this point one could tell judge Chin that the "Cloud" called and said that the notion that she was in one place was a piece of contumely by the unknowing, she is in very, very, very many places; the query window and the hit list only give the illusion of one place and one source. It is actually: first, millions of places representing users in front of their very own query screens and second, millions of sources representing all the servers Google has indexed. Thus the "Cloud" would like to inform any and all, if it please the court, and it should, the notion that Google has gathered all book into Mountain View is simply poor thinking and should not be part of a factual basis for a ruling. It is the poverty of legal logic, forced into counter factuality through inadequate concepts, trying to come to terms with the realities of modern technology.

The estimate of 174 million unique books is interesting but hard to interpret without more words of explanation. If you hold two books with the same title, same dust jacket and same binding in your hand, you are holding two books, yet there is only one unique title in your hand. If a text has a total of 300,000 words it may have only 25,000 unique words. While it may not be so easy to find out just how many unique books there are in the world one might try to be really precise. One can assume the readers might not be sure what is meant by "total number of extant copies of books" or "total number of unique titles." With unique title one might have a clearer idea, or not. The idea of having 100 million books scanned is rather novel, but doable.

Then come the lines on the Europeans... I would like to take a break and move on to something somewhat different. After the French and Google have become best pals, I wonder if the section on the Europeans is still relevant. The Chin ruling can be goggled and downloaded - read it yourself. I am not sure that dissecting more sentences would really advance my point. In the next installments, I plan to investigate intellectual property.