A word on OCR and why it’s missing

Last Friday we had to release DEVONthink Pro Office 2.0 public beta 2 without an embedded OCR engine – which is truly annoying for you and us.

Starting with version 2.0 of DEVONthink Pro Office we are moving from the IRIS engine to the ABBYY FineReader engine which produces way smaller PDFs, is more accurate, and much smarter to embed into our application. Where the IRIS engine was an external program remotely controlled by DEVONthink, the engine provided by ABBYY is a true framework that we can directly embed and control. You will see this soon in a much improved ‘OCR Activity’ panel and no other OCR windows popping up for every page.

So why did we leave out in this release? Naturally, as version 2.0 is based on OCR provided by ABBYY our license we have with IRIS would not cover IRIS-based OCR in a DEVONthink Pro Office 2.0 beta without paying for every single license on top of the ABBYY license. But we still have technical difficulties with the ABBYY engine, namely it simply crashes when you feed it with PDF files. We cannot deliver this — but we had to release a new public beta last Friday because public beta 1 expired this weekend. A catch-22.

We apologize for these timing inconveniences and we are working hard together with the ABBYY technicians in Moscow to solve this issue quickly. We will deliver either a new public beta release or an updated OCR component as soon as ever possible to re-enable you to run OCR on your scans. I will keep you updated on all progress here in my blog.

In the meantime, please simply add your PDFs to your database as you had in the past, you can convert them to searchable documents as soon as the ABBYY OCR component is back using ‘Data > Convert > To searchable PDF’. You can easily find all un-OCR-ified PDFs in your databases using a smart group looking for kind ‘PDF/PS’ and a word count of zero.

26 Responses to “A word on OCR and why it’s missing”

  1. Shenandoah Don says:

    Are you saying you guys aren’t savvy enough to release \ re-release the initial public beta with a new expiration date?

    That would give you an extension to work out the PDF OCR challenge with the new engine.

    The decision to release an impaired version seems to have been based on a false premise: Go with the new OCR engine or none at all.

  2. Eric says:

    @Shenandoah: No, because that would be a breach of the license agreement.

  3. Chris says:

    I agree with nigel. Why not warn everyone instead of letting us find out for ourselves. Very immature and unprofessional. Brand Loyalty is Red 2 You!

  4. Mitch says:

    HUH?????? you’ve now turned my scanner into a essentially a fax machine with these release! Your lack of foresight and poor judgement in releasing a beta without warning us that the core functionality is missing is quite frankly, appalling. In 30 years in the software business I’ve never seen or heard of something happening like this. Can I back down to the previous version?? I have 11 clients relying on this as well as myself!

  5. Bonedo says:

    C’mon, guys! If you want OCR, it’s easy: just manually adjust your MacOS clock in such a way to display a date before January 31. If you do so, DevonThink Pro Office beta 1 will run again! Yes, an easy way to turn back time…

  6. Eric says:

    Gentlemen, we tried to solve this problem until the very last minute but to no avail. So, please take my apologies for not pulling the emergency brake earlier.

  7. rb says:

    @Mitch: You have 30 years in the software business AND you have 11 clients depending on a beta release of software?

    Realy? Are you serious?

  8. Michael says:

    Eric – Apology accepted.

    Everyone else – Relax. If you’re still using the original software that came with your ScanSnap, you can continue to OCR stuff using the ABBYY FineReader for ScanSnap that came with your scanner. And if you can wait a few days, this will probably be sorted out. This is, after all, a beta.

  9. Mitch says:

    Are we talking days? weeks? months? Give us an option to back down versions please

  10. Ryan Briggs says:

    Um, guys, this is a BETA. If you are using beta software, don’t complain that something is wonky. If you are doing really important stuff with DTP then you made a stupid move when you started using DTP2 beta. Don’t blame Devon for your mistake.

  11. Neil says:

    Will there be any benefit to re-processing PDFs (that were OCR’d with IRIS) using the new ABBYY engine? Will that even be an option?

    I’d like to second the request for a guesstimate of OCR capability.

  12. Neil says:

    @Ryan Briggs

    While I agree that using beta software for production work is risky, that’s how beta software gets tested. I think it would be less troublesome if the non-beta version of Pro Office were easily available (special request does not count as easy on Saturday or Sunday).

  13. Eric says:

    @Neil: Agreed, but DEVONthink Pro Office 1.5.4 also works with a 2.0 license code. And making it available also to users who have not purchased it would be basically a breach of the license agreement.

    @Mitch: We and ABBYY are working hard at this issue this very moment.

  14. scm8 says:

    @Ryan Briggs

    Expect some bugs, sure. But when this is the lead item on the release notes for beta 2 — “New: New faster and more accurate OCR engine based on ABBYY FineReader” — it seems reasonable for beta-users to assume that OCR would be implemented in beta 2.

  15. Gene Carangal says:

    “n the meantime, please simply add your PDFs to your database as you had in the past, you can convert them to searchable documents as soon as the ABBYY OCR component is back using ‘Data > Convert > To searchable PDF’. ”

    I tried that. I get the following message –

    “OCR functionality has been temporarily disabled in this release due to technical difficulties with the OCR library.”

  16. JamesR says:

    I am willing to accept the crippled software, but please just keep us updated daily with some progress on when we can expect OCR functionality to re-appear.



  17. Eric says:

    @JamesR: In my post I promised just this.

  18. jcw says:

    That’s a lot of whining here for an app which is in beta. I for one am glad you didn’t enable OCR if it crashes DT and am fine with scanning images now and doing the OCR later.
    Have reported some issues with 2.0b1 and am very pleased with the support and the progress made in 2.0b2. Please keep up the good work, you’re doing a terrific job. As for the above comments… they illustrate how desperately people want you guys to succeed.

  19. David says:

    The thing is, that the 2.0 version is all over the web site and being sold, but missing this core functionality. This should be made obvious to any one buying it right now. This way Devon could avoid all the fuss it is experiencing right now.

  20. JamesR says:

    Any updates? Another week, another month, another 2 months. Give us something here.


  21. JamesR says:

    One more thing. Please make it easy for us to do a batch “OCR in place” for all these documents we have been scanning without OCR. Keeping creation/modification date in check, etc..



  22. Chris says:

    I spoke with eric yesterday and he assured me that they were spending all there time working on the bug and it should be fixed soon.

  23. Gavin says:

    I have a whole pile of docs awaiting scanning 🙂 Any updates, This is sorta what I use the program for.

  24. Allen says:

    Thanks for the update, I really appreciate it! I miss the OCR but I’m sure you’ll have me up and running again ASAP.

    I don’t (yet) depend on the software for mission critical work (it *IS* a BETA after all (@other people in thread who are angry) but I really like what I see so far.

  25. Miro says:

    It would be nice to make note about missing OCR at the download page. I would never have downloaded and installed the product if I knew the OCR is disabled. I am not sure if I ever want to come back after this experience.

  26. Steve says:

    Thanks for the tip about searching for non-OCR’ed documents to do later. I would have never figured that one out on my own. Great software! I use it daily. Nothing does what DTP Office does IMHO.