Tuesday Tip: Dealing with duplicates in DEVONthink

Duplicates in DEVONthink are great when you need them, not so great when you don’t. But how to track them down and get rid of them?

In your database you will find a built-in smart group, a purple folder with a gear icon on it, called Duplicates. Selecting this folder will show you the files that are duplicated. (Depending on your settings in Preferences > General, duplicate files may be shown in blue.) So how do you manage them?

1. In the menu Scripts > Data (Scripts looks like a stylized “S”) there is a script called Move Duplicates To Trash. If you select files in the Duplicates smart group (any or all) and run this script, all but the most recently added instance of each set of duplicates will be sent to the database’s trash. Quick and easy cleanup. But…

2. Maybe you’re using DEVONthink Personal or you want to control what file gets trashed and what one stays. Select a file and choose Data > See Also & Classify (or click the magic hat icon) to open the See Also drawer. You’ll notice the file is listed at the top multiple times with the database location under the name. (The green bar next to the name is a quick visual hint that they are matches.) From here you can right-click to Reveal the file in the database or choose Move All Instances to Trash. Yes, this works on multiple selected files.

Note this does not apply to replicants as there is truly only one file with a replicant.

One last note: A duplicate is not necessarily a byte-for-byte duplicate but can also be a “close match”. When you use the second option above, you could select Open from the context menu and edit the file and you would see the green bar on the edited file has turned gray and is shorter. DEVONthink will consider it a close match but no longer a duplicate, even if the name was still the same. It won’t be something so small as adding a comma to break the duplicate status though. (Think about it – if the only difference in two things you’ve read is a single comma or sentence, you’d functionally consider them the same.)

5 Responses to “Tuesday Tip: Dealing with duplicates in DEVONthink”

  1. Robb Allan says:

    “(Think about it – if the only difference in two things you’ve read is a single comma or sentence, you’d functionally consider them the same.)”

    Not if you are a lawyer.

  2. Ralph Elliott says:

    You needn’t even be a lawyer.

    There is this great little book with the title
    “Eats shoots and leaves”
    which among other things goes into the significance of a comma.

    Anytime text must clearly and unambiguously express a point, a comma may make all the difference in the world.

  3. Richard says:

    So, where is the “duplicates” smart group (or how can I create it)? Just read this article and don’t see this in version 2.9.7…

  4. Jim Neumann says:

    If you create a new database, one will be created.
    Also, you can create a new one via Data > New > Smart Group, with criteria of Instance is Duplicate and Kind is Any Document.

  5. Bill DeVille says:

    @Robb Allan & Ralph Elliot: DEVONthink isn’t a lawyer or grammarian, and it doesn’t do semantic analysis. In searches and in its Concordance — and in its AI features — it ignores nonalphabetical characters such as commas. That’s why, in parsing different documents each of whose content consists of a variation of the phrase “Eats shoots and leaves” with and without meaningful (to a human) placement of commas, it finds those documents to be identical, i.e., duplicates.

    DEVONthink also ignores case, so that it doesn’t distinguish between, e.g., Smith as a last name and smith as an occupation. It finds those two terms to be identical.

    Without those compromises, the Concordance (and its complexity) would bloom enormously, and consumer-level Macs couldn’t perform searches or make Classify and See Also suggestions instantly. We would need computers with the resources of IBM’s Watson. But with those compromises, DEVONthink’s AI algorithms can still handle a number of contextual relationships of terms in a document surprisingly well and very quickly. 🙂