To edit this page, please answer the question that appears below (more info):
What short name does OAIS use for an information package that is used for archiving?
Free text:
= Description = The Matchbox tool is responsible for finding duplicatre pairs in a collection of digital documents based on SIFT features and SSIM methods. Consequently the tool takes a collection path with associated parameters as input. Currently three scenarios are implemented. These are: * Duplicate search in one turn (parameter 'all') * Professional duplicate search (experienced user can execute particular step in 'FindDuplicates' workflow) * Quick check if two documents are duplicates (based on previous BoW dictionary). Further parameters that influence and adjust duplicate analysis are currently investigated. Image processing method: The image processing algorithm can be described in 4 steps: 1. Document feature extraction * Interest point detection (applying Scale Invariant Feature Transform (SIFT) keypoint extraction) * Derivation of local feature descriptors (invariant to geometrical or radiometrical distortions) 2. Learning visual dictionary * Clustering method applied to all SIFT descriptors of all images using k-means algorithm * Run over collection and collect local descriptors in a visual dictionary using Bag-Of-Words (BoW) algorithm 3. Create visual histogram for each image document 4. Detect similar images based on visual histogram and local descriptors. Evaluate similarity score β pair-wise comparison of corresponding keyword frequency histograms for all documents. Conduct structural similarity analysis applying Sturctural SIMilarity (SSIM) approach (1 means identical and 0 means very different) * Rotate * Scale * Mask * Overlaying Usage: FindDuplicates script can be invoked from command line. For standard usage two parameters are required: path to the collection documents and 'all'. scape/pc-qa-matchbox/Python# python2.7 FindDuplicates.py h usage: FindDuplicates.py [-h] [\--threads THREADS] [\--sdk SDK] [\--precluster PRECLUSTER] [\--clahe CLAHE] [\--config CONFIG] [\--featdir FEATDIR] [\--bowsize BOWSIZE] [\--csv] [-v] dir all,extract,compare,train,bowhist,clean = User Experiences = currently installed at Austrian National Library = Development Activity =