Skip to content

1. Introduction

The Amsterdam’s city archives (SAA) possesses physical handwritten inventories records where a record may be for example an inventory of goods (paintings, prints, sculpture, furniture, porcelain, etc.) owned by an Amsterdamer and mentioned in a last will. Interested in documenting the ownership of paintings from the 17th century, the Yale University Professor John Michael Montias compiled a database by transcribing 1280 physical handwritten inventories (scattered in the Netherlands) of goods. Now that a number of these physical inventories have been digitised using handwriting recognition, one of the goals of the Golden Agent project is to identify Montias’ transcriptions of painting selections within the digitised inventories. This problem can be generically reformulated as, given a source-segment database (e.g. Montias DB) and a target-segment database (e.g. SAA), find the best similar target segment for each source segment.

The problem introduced here relates to the approximation of the relevance of a document to a query. Such approximation can be done using lexical similarity (word level similarity), semantic similarity or hybrid similarities. In this work, given the problem at hand, the focus is rather on the lexical similarity.

Inventory