OCR is back — and better than ever before

After two weeks without OCR, we have now finished our workarounds that circumvent the PDF bug in the ABBYY engine and just released public beta 3 of all editions of DEVONthink as well as of DEVONnote.

The third public beta of DEVONthink 2.0 and DEVONnote 2.0 finally brings back optical character recognition (OCR) to DEVONthink Pro Office, introduces a new Boolean operator, and brings a number of bug fixes and detail enhancements. In addition, all editions of DEVONthink and DEOVNnote feature the new ‘OPT’ Boolean operator, increase the compatibility to del.icio.us and Adium, and fix a number of bugs. Read more about the update on our News page and check the release notes (DEVONthink, DEVONnote).

Scheduled for the next public betas are now completing the sidebar as well as the document properties panel. When these are completed we’ll start working on implementing the tag view. Please understand that we don’t publish a concrete release schedule as timing is always subject to change with the challenges we encounter in development.

18 Responses to “OCR is back — and better than ever before”

  1. Mathew says:

    Wow, thanks so much! That actually was a pretty quick turnaround—it’s much appreciated. Now to take the new beta for a test spin!

    Any good suggestions for defaults for the OCR prefs? I changed mine to 400 dpi and 75% image quality. But this results in a much larger file relative to FineReader for ScanSnap? (about 3 times larger and I can’t check against FineReader as it doesn’t seem to provide preference options)

  2. See the release notes of this version: because of our workaround for the problems we encountered everything will be converted to 300dpi before conversion, so using 400dpi for the output is not useful at the moment. And also, this workaround creates colour documents from BW scans, so they may end up larger than the original.

  3. Rodney says:

    Thanks a million for the new OCR module. I’m off to give it some extensive testing.

  4. Lee says:

    It is? I can’t find a trace of it at all. I’m missing something, obviously. Perhaps I should have my coffee first and then relook at it. There is no sign of OCR in DTpro or in its documentation. Odd.

  5. Eric says:

    @Lee: OCR is only available in DEVONthink Pro *Office*, not in the regular Pro 🙂

  6. JamesR says:

    Can I get an Amen?

  7. Ian says:

    If it is so much better is it a worthwhile exercise running all existing documents through the new OCR engine?

  8. jcw says:

    F a n t a s t i c.

    The size of the resulting docs opens up an entirely new set of options for me.

    Thank you for making all the right choices.

  9. Neil says:

    I would also like to know if there is a benefit to re-processing existing PDF+Text documents with the new OCR engine.

    What sort of file size improvements are expected?

  10. Brian Puccio says:

    Will this conversion to color 300 dpi PDFs regardless of the original PDF behavior ever be changed or is this a permanent “feature”?

  11. NATAN says:

    HI, COULD YOU TELL ME WHY MY MESSAGE WAS THROWN AWAY ? NOTHING WRONG IN IT. thank you to let an answer in my private mail. A.N.

  12. Lee says:

    I suppose my follow up Q is : is OCR destined to be in DT Pro eventually?

  13. Blink says:

    Can you compare Neatworks Neat Receipts software with Devonthink Office Pro? I like the sharpness and speed of the Neatworks scans, and I like the recognition of receipts, versus documents, versus contacts. I also like the initial attempt at filling in the standard data for each file into the corresponding template (contact, receipt, or document). It also has smart files which collect documents based on key words.

    However, the document templates don’t fill very accurately under real usage.

    On Devonthink, I like the use of AI to suggest documents, but I wish it had the ability to identify receipts, contacts, versus documents, and to fill in basic template information from contacts and from receipts. I also find the sharpness of the Devonthink scans to be noticeably less crisp as the Neatworks scans.

    I am using a S510M Scansnap with an iMac.

    I have not purchased Devonthink yet, since there is so much beta testing going on, but I would like to know how to transfer Files from the Neatworks libraries, which have their own extension, to Devonthink. I would also like to know whether the 20 OCR per day limit during the beta is 20 pages/day or 20 OCRs/day(without regard to the # of pages scanned?)

    Also, any chance of getting more pages per day until the Beta Stage is over (or farther along). In the meantime I need to scan hundreds of documents per day. I’ve been scanning them into Neatworks, but I’m concerned I won’t be able to transfer them to Devonthink… since they are in the Neatworks library. I’d rather just start scanning directly to Devonthink…

    Cheers to a nice application!


  14. Eric says:

    @JamesR: Amen for what excatly? 😉
    @Ian, @Neil: It depends on your documents. You can try it with one and check if it makes any difference for you.
    @Brian: It’s a “feature” as long as we’re waiting for the fix from ABBYY, at least.
    @Natan: We never throw any messages away. But we moderate comments to this blog to keep spam out of our doors. This is a manual process and we don’t work on weekends 🙂

  15. Eric says:

    @Lee: No, OCR will stay a Pro Office feature.
    @Blink: Neat is working on drivers that let you connect their NeatReceipts scanner to DEVONthink. I have no idea if you can drag files from their software to ours but if their software is a good Mac OS X citizen it should be possible. The testing limit is 20 OCRs a day, regardless of the number of pages. If you wish, please feel free to send us a copy of some Neatworks library files so that we can have a look.

  16. Ciarán says:

    I really like how DevonThink 2.0 is developing. One of the most important features for me are tags. I realise that this will be implemented in a future release. I use various tagging applications such as Fresh, Tags and TagIt which all use the OpenMeta tagging format, which seems to be on its way to becoming the new standard. However, there is no easy way to use any of these apps to tag information in DevonThink. Will the new tagging feature in DevonThink use OpenMeta?

  17. Eric says:

    @Ciarán: We are in talks with the OpenMeta developers and we are following other discussions around this library. One thing that one needs to know is: Our tagging model allows to hierarchical tags. When we would use OpemMeta and you e.g. rename just one tag with sub-tags (group with sub-groups) this could lead to literally thousands (!) of files to be updated with the changes, if not hundreds of thousands of files.