Google Print hacking

From ScottWiki

Jump to: navigation, search

See also Google Print hacking history and Google Print development history.

Contents

Restricted pages

It appears that for any given book, a certain set of pages is 'restricted'. This seems to mean that while Google has scanned and OCR'ed the image, it will not show the image, regardless of how much of the book you've already looked at. The page appears in search results where appropriate. The general rule seems to be that 10-15% of each book is restricted in this way, with the restricted pages appear primarily (exclusively?) in the second half of the book. See the Restricted pages example.

It appears without Restriction, "img=1" in URL of image, otherwise, "img=2", for example:

http://print.google.com/print?id=-aAwQO_-rXwC&pg=354&img=1&q=neural+pattern&sig=eqt38N2w5x6yqWat-G5r5_pzOpY http://print.google.com/print?id=-aAwQO_-rXwC&pg=353&img=2&q=neural+pattern&sig=ucC0lCJQvQw9HHHpisLA4SPQ_fs

Page numbering

Throughout this section I use the phrase 'page code' to refer to the value of the pg field in a Google Print URL. They are often, but not always, the same as the page number.

Several unusual page numbering schemes have been noted, and are described below.

Front matter

The front matter of each book is typically numbered using roman numerals. (Link to javascript converters?) Alternatively, (often if they are unnumbered in the original), they will have page codes of the form 0_1, 0_2, etc.

Underscores

It appears that sometimes Google hasn't been able to be sure of the page number of a page, or sequence of pages. Consider an example in which this occurs for pages 63 and 64 of a book. It appears that the page codes used internally (for example, in the URLs) will be as follows: page 62 will have pg=62, page 63 will have pg=62_1, 64 pg=62_2, and 65 will again have pg=65. This occurs, for example, at [1].

Be aware, however, that page codes beginning with 0_ refer to pages in the front matter of the book.

Programs extracting a page number from a URL should take this into account. (Easily; if the page code has an underscore, but does not start with 0_, add the strings on either side.)

Asterisks

Occasionally (example?) books will have entire sections with unusual page numbering. I've seen, for example, a book in which the entire second half of the book was numbered as, for example, page 173 having pg=2*173.

Enabling right clicks

A variety of Greasemonkey scripts re-enable the context menus on Google Print pages. Examples include CustomizeGoogle, and the Google Butler. Here's a code snippet from the GPL'ed CustomizeGoogle.

On Firefox

On Firefox, an easier way is to open the Preferences window (Tools->Options on Windows, Edit->Preferences on Linux), choose the Web Features tab, click the Advanced button next to Enable JavaScript, and uncheck "Disable or replace context menus" ...

Removing the yellow highlighting

This is done most easily by simply removing the q= field from either the page URL or the image URL. An alternative would be to use ImageMagick, for example as follows:

convert <inputfile> -stroke white -fill white -draw 'rectangle 555,300 575,600'  -fx "y" -despeckle <outputfile>

This removes the 'Copyrighted Material' written on the right edge of each page, as well.

Personal tools