Difference between revisions of "Demystify"

From COPTR
Jump to navigation Jump to search
m (Fixup date)
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<!-- Use the structure provided in this template, do not change it! -->
+
{{Infobox tool
 
+
|purpose=Format Identification Analysis and Reporting
{{Infobox_tool
+
|homepage=https://github.com/exponential-decay/demystify
|purpose=Analysis and automatic generation of summary information from DROID output
 
|image=
 
|homepage=https://github.com/ross-spencer/droid-sqlite-analysis
 
 
|license=Open source (see URL above)
 
|license=Open source (see URL above)
|platforms=sqlite + Python
+
|platforms=sqlite + Python + text/html
 +
|function=Metadata Extraction, Content Profiling, De-Duplication
 
}}
 
}}
 +
{{Infobox tool details}}
 +
== Description ==
 +
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
 +
Now known as "Demystify" (formerly 'DROID Siegfried Sqlite Analysis Engine') with thanks to Joshua Ng for the suggestion to rename it. Demystify is an engine for the analysis of [https://github.com/digital-preservation/droid DROID] CSV export files, [https://github.com/richardlehane/siegfried Siegfried] YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
 +
 +
The tool provides archivist definitions for each of the sections output; these definitions are customizable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a blacklist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
  
<!-- Add one ore more categories to describe the function of the tool. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). The following are common category examples, remove those that don't apply -->
+
The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with [http://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html rsync] to isolate these sets from one-another to be more flexible to work with.
[[Category:Metadata Extraction]]
 
[[Category:Content Profiling]]
 
  
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
+
=== Demystify Lite ===
  
== Description ==
+
[https://ross-spencer.github.io/demystify-lite/ Demystify Lite] provides a Pyscript/WASM implementation of Demystify's features and runs completely browser side for users with DROID or Siegfried reports that they would like to see analyzed.
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
 
Engine for analysis of DROID CSV export files. The tool has three purposes, break the DROID CSV export into its components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
 
  
 
== User Experiences ==
 
== User Experiences ==
 
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. -->
 
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. -->
[http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export Blog post] from the tool author, Ross Spencer.
+
*Blog entries from the tool author, Ross Spencer:
 +
**'''[2014-06-03]''' [http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export Describing the creation and purpose of the tool.]
 +
**'''[2015-08-25]''' [http://openpreservation.org/blog/2015/08/25/hero-or-villain-a-tool-to-create-a-digital-preservation-rogues-gallery/ Using the output of the tool to create a digital preservation rogues gallery.]
 +
**'''[2016-05-23]''' [http://openpreservation.org/blog/2016/05/23/whats-in-a-namespace-the-marriage-of-droid-and-siegfried-analysis/ The integration of Siegfried output for consistent and repeatable reporting.]
 +
**'''[2016-05-24]''' [http://openpreservation.org/blog/2016/05/24/while-were-on-the-subject-a-few-more-points-of-interest-about-the-siegfrieddroid-analysis-tool/ Creating a multi-lingual consistent, digital preservation dialect and exploring alternative methods of format identification using Siegfried's capabilities.]
 +
**'''[2022-05-09]''' [https://journal.code4lib.org/articles/16351 Fractal in detail: What information is in a file format identification report?]
  
== Development Activity ==
+
= Development Activity =
 
<!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. -->
 
<!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. -->
<rss max=7>https://github.com/ross-spencer/droid-sqlite-analysis/commits/master.atom</rss>
+
All development activity is visible on GitHub: http://github.com/ross-spencer/demystify/commits
 
+
 +
=== Release Feed ===
 +
Below the last 3 release feeds:
 +
<rss max=3>https://github.com/exponential-decay/demystify/releases.atom</rss>
 +
 +
=== Activity Feed ===
 +
Below the last 5 commits:
 +
<rss max=5>https://github.com/exponential-decay/demystify/commits/main.atom</rss>
 +
 
<!-- Add the Ohloh.com ID for the tool, if known. -->
 
<!-- Add the Ohloh.com ID for the tool, if known. -->
{{Infobox_tool_details
 
|ohloh_id=
 
}}
 

Latest revision as of 07:09, 27 March 2024



Format Identification Analysis and Reporting
Homepage:https://github.com/exponential-decay/demystify
License:Open source (see URL above)
Platforms:sqlite + Python + text/html
Function:Metadata Extraction,Content Profiling,De-Duplication




Description[edit]

Now known as "Demystify" (formerly 'DROID Siegfried Sqlite Analysis Engine') with thanks to Joshua Ng for the suggestion to rename it. Demystify is an engine for the analysis of DROID CSV export files, Siegfried YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.

The tool provides archivist definitions for each of the sections output; these definitions are customizable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a blacklist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with rsync to isolate these sets from one-another to be more flexible to work with.

Demystify Lite[edit]

Demystify Lite provides a Pyscript/WASM implementation of Demystify's features and runs completely browser side for users with DROID or Siegfried reports that they would like to see analyzed.

User Experiences[edit]

Development Activity[edit]

All development activity is visible on GitHub: http://github.com/ross-spencer/demystify/commits

Release Feed[edit]

Below the last 3 release feeds:

2024-05-05 15:18:04
[tag:github.com,2008:Repository/15066530/v2.0.0 v2.0.0]
by ross-spencer
2024-04-14 13:53:10
[tag:github.com,2008:Repository/15066530/v2.0.0rc7 v2.0.0rc7]
by ross-spencer
2024-03-25 20:52:06
[tag:github.com,2008:Repository/15066530/v2.0.0rc6 v2.0.0rc6]
by ross-spencer

Activity Feed[edit]

Below the last 5 commits:

2024-05-05 15:16:45
[tag:github.com,2008:Grit::Commit/8433426905e657dd90df739878dbe12ee0761aeb Demystify 2.0.0]
by ross-spencer https://github.com/ross-spencer
2024-04-14 13:51:43
[tag:github.com,2008:Grit::Commit/9060b9a424f22d1d60ad15ee0f1f7a03aa511f2a Update National Archives URI]
by ross-spencer https://github.com/ross-spencer
2024-03-25 20:51:14
[tag:github.com,2008:Grit::Commit/edcadd0246fc17580880c6bedcda55ad065db410 Update black]
by ross-spencer https://github.com/ross-spencer
2024-03-25 19:53:59
[tag:github.com,2008:Grit::Commit/40c5a871d14e866d55a61649930a12c3c629a9bb Remove denylist restrictions]
by ross-spencer https://github.com/ross-spencer
2024-03-24 19:08:08
[tag:github.com,2008:Grit::Commit/fc9dcc15c2c06bccf3b25122597d6860aafa9e44 Update workflows]
by ross-spencer https://github.com/ross-spencer