Tuesday Tip: Batch convert PDFs with OCR

If you want to import a batch of scanned PDFs without converting them to searchable PDFs right away, e.g. because you want to run the time-consuming OCR process later, you can easily do so. Import the files, e.g. to a group ‘To OCR’. Later, make sure to check Preferences > OCR > Original Document > Move to Trash, select all the documents, and convert them in one batch using ‘Data > Convert > to searchable PDF’. This replaces the selected scanned PDFs with searchable copies and deletes the original file in one batch.

4 Responses to “Tuesday Tip: Batch convert PDFs with OCR”

  1. Bradley Dichter says:

    When we import PDFs from a Mac server, where some are labeled, say Red, Yellow, Blue in the Finder, that label is not imported. Is that by design, a bug or a mis-configured preference. Thanks in advance.

  2. Bradley Dichter says:

    What about a PDF with a sticky note or two? That extra data doesn’t seem to import either into the DEVONthink view of the imported PDfs. Is that normal?

  3. Eric says:

    The annotations may be destroyed when you run OCR on them. Are they still present in the imported PDF when you don’t run them through OCR?

  4. Didier says:

    When I try to batch convert a bulk of documents (about 700) Devonthink Pro Office 2.3.5 crashes without log on OS X Leopard 10.5.8 (running on Xserve Harpertown with 12 Gig of memory). It is a pity that the program crashes here, as the number documents should be limited by disk space or time, but not to their number! Also, the community still is waiting for DevonThink finally supporting multiple cores. It is a shame that on a batch convert, especially, the program can only deal with one core. Please, implement it. In the latest version it is not implemented yet (2.4.2).