As often with DEVONthink, to capture web content you have a large number of options:
- Drag the URL to save only the bookmark
- ‘Print’ to PDF
- Save the page as a web archive
- Save only the page’s HTML code
- Copy text and images and save the clipping as rich text
All these options have their advantages and disadvantages:
Saving only the page’s HTML code is usually only interesting if you are a web developer as this keeps the page but not the linked images so as soon as the images become unavailable the page layout looks ugly.
Better are web archives as they also keep all the linked materials and so the look-and-feel of the original page. But: web archives are saved by the Safari engine in a file format proprietary to Apple and so they may become unreadable in the future should Apple decide to drop support for it. In the last 20 years we have seen many file formats come and go. PDF keeps the layout and the content and it’s relatively future-proof as it is also an ISO standard.
One thing all these options have in common is that they keep the original layout — but also store unnecessary elements, even though DEVONthink tries to strip them if possible, such as navigation, advertising, or other text elements not related to the core information you want to keep. This does, of course, decrease the effectiveness of the AI engine.
If you are only interested in the actual information the best option may be to select text and images and drag them from Safari or DEVONagent to your database. This saves the selection including images as rich text document which should be relatively future-proof (as it is a widely used standard) and saves only the data you are interested in, not n copies of the words ‘Home’ and ‘Back’ 🙂
So, depending on your needs, saving only the interesting parts of a web page can be more efficient than saving the whole page as a web archive or PDF. If you are interesting in the original look of the page, PDF is a future-proof, standard-based option.