Difference between revisions of "Demystify"

From COPTR
Jump to navigation Jump to search
(Added more detailed information about the tool's capabilities.)
Line 6: Line 6:
 
|homepage=https://github.com/ross-spencer/droid-sqlite-analysis
 
|homepage=https://github.com/ross-spencer/droid-sqlite-analysis
 
|license=Open source (see URL above)
 
|license=Open source (see URL above)
|platforms=sqlite + Python
+
|platforms=sqlite + Python + text/html
 
}}
 
}}
  
Line 12: Line 12:
 
[[Category:Metadata Extraction]]
 
[[Category:Metadata Extraction]]
 
[[Category:Content Profiling]]
 
[[Category:Content Profiling]]
 +
[[Category:De-Duplication]]
  
 
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
 
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
Line 17: Line 18:
 
== Description ==
 
== Description ==
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
Engine for analysis of DROID CSV export files. The tool has three purposes, break the DROID CSV export into its components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
+
Engine for analysis of [https://github.com/digital-preservation/droid DROID] CSV export files, [https://github.com/richardlehane/siegfried Siegfried] YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
 +
 
 +
The tool provides archivist definitions for each of the sections output; these definitions are customisable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a blacklist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
 +
 
 +
The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with [http://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html rsync] to isolate these sets from one-another to be more flexible to work with.  
  
 
== User Experiences ==
 
== User Experiences ==
 
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. -->
 
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. -->
[http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export Blog post] from the tool author, Ross Spencer.
+
Blog entries from the tool author, Ross Spencer:
 +
*'''[2014-06-03]''' [http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export Describing the creation and purpose of the tool.]
 +
*'''[2015-08-25]''' [http://openpreservation.org/blog/2015/08/25/hero-or-villain-a-tool-to-create-a-digital-preservation-rogues-gallery/ Using the output of the tool to create a digital preservation rogues gallery.]
 +
*'''[2016-05-23]''' [http://openpreservation.org/blog/2016/05/23/whats-in-a-namespace-the-marriage-of-droid-and-siegfried-analysis/ The integration of Siegfried output for consistent and repeatable reporting.]
 +
*'''[2016-05-24]''' [http://openpreservation.org/blog/2016/05/24/while-were-on-the-subject-a-few-more-points-of-interest-about-the-siegfrieddroid-analysis-tool/ Creating a multi-lingual consistent, digital preservation dialect and exploring alternative methods of format identification using Siegfried's capabilities.]
  
 
== Development Activity ==
 
== Development Activity ==

Revision as of 03:14, 13 June 2016


Analysis and automatic generation of summary information from DROID output
Homepage:https://github.com/ross-spencer/droid-sqlite-analysis
License:Open source (see URL above)
Platforms:sqlite + Python + text/html


Description

Engine for analysis of DROID CSV export files, Siegfried YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.

The tool provides archivist definitions for each of the sections output; these definitions are customisable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a blacklist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with rsync to isolate these sets from one-another to be more flexible to work with.

User Experiences

Blog entries from the tool author, Ross Spencer:

Development Activity

Failed to load RSS feed from https://github.com/ross-spencer/droid-sqlite-analysis/commits/master.atom: There was a problem during the HTTP request: 404 Not Found