{"id":24062,"date":"2013-09-20T00:12:14","date_gmt":"2013-09-20T04:12:14","guid":{"rendered":"http:\/\/blogs.ams.org\/mathgradblog\/?p=24062"},"modified":"2014-06-24T13:33:22","modified_gmt":"2014-06-24T18:33:22","slug":"repair-scanned-documents-gscan2pdf","status":"publish","type":"post","link":"https:\/\/blogs.ams.org\/mathgradblog\/2013\/09\/20\/repair-scanned-documents-gscan2pdf\/","title":{"rendered":"Repair Scanned Documents With gscan2pdf"},"content":{"rendered":"<p>I had the pleasure last week of tracking down an article locked behind a digital paywall. It arrived through inter-library loan in the form of a book, all issues of the journal that year bound together. I felt a little disappointed as it meant I&#8217;d be left with lower quality scans. (You know what I&#8217;m talking about if you&#8217;ve ever placed a book on a copy machine.) I turned to the internet for a solution and discovered the tool <a href=\"http:\/\/gscan2pdf.sourceforge.net\/\">gscan2pdf<\/a>.<\/p>\n<p><!--more--><\/p>\n<p>There are two ways to begin using gscan2pdf: 1) scan your document using the application or 2) open an existing document, for instance a multi-page PDF produced by your department&#8217;s copy machine. The built in tools allow you to reorder pages, crop, rotate and perform a few other adjustments. The &#8220;Clean Up&#8221; tool offers a GUI panel for <a href=\"http:\/\/unpaper.berlios.de\/\">unpaper<\/a>, a post-processer for fixing bad scans. Running unpaper after basic editing worked very well, correcting subtle alignment and border issues with the scans. I didn&#8217;t try them, but gscan2pdf can also incorporate three optical character recognition packages (if installed on your machine):<\/p>\n<ul>\n<li><a href=\"https:\/\/code.google.com\/p\/tesseract-ocr\/\">tesseract-ocr<\/a><\/li>\n<li><a href=\"https:\/\/code.google.com\/p\/ocropus\/\">OCRopus<\/a><\/li>\n<li><a href=\"https:\/\/launchpad.net\/cuneiform-linux\">Cuneiform<\/a><\/li>\n<\/ul>\n<p>Gscan2pdf is handy for repairing scanned documents. It is open source with Debian packages available (only an <em>apt-get<\/em> away if you run Ubuntu). Unfortunately, I couldn&#8217;t find any OS X or Windows builds. I did come across an application called <a href=\"http:\/\/scantailor.sourceforge.net\/\">Scan Tailor<\/a> which works on Windows and GNU\/Linux and appears to offer similar functionality to gscan2pdf. I am not aware of any OS X apps; if you know of any please offer suggestions in the comments below.<\/p>\n<p>After going through the hassle of requesting the articles, waiting for their retrieval and scanning them&#8230; you might as well spend another five minutes and fix up the results using one of these applications. It only takes a few minutes to learn the operation of gscan2pdf and results in improved readability.<\/p>\n<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>","protected":false},"excerpt":{"rendered":"<p>I had the pleasure last week of tracking down an article locked behind a digital paywall. It arrived through inter-library loan in the form of a book, all issues of the journal that year bound together. I felt a little &hellip; <a href=\"https:\/\/blogs.ams.org\/mathgradblog\/2013\/09\/20\/repair-scanned-documents-gscan2pdf\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" data-url=https:\/\/blogs.ams.org\/mathgradblog\/2013\/09\/20\/repair-scanned-documents-gscan2pdf\/><\/div>\n","protected":false},"author":31,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-24062","post","type-post","status-publish","format-standard","hentry","category-advice"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3gbww-6g6","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/posts\/24062","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/comments?post=24062"}],"version-history":[{"count":9,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/posts\/24062\/revisions"}],"predecessor-version":[{"id":24071,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/posts\/24062\/revisions\/24071"}],"wp:attachment":[{"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/media?parent=24062"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/categories?post=24062"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ams.org\/mathgradblog\/wp-json\/wp\/v2\/tags?post=24062"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}