Difference between revisions of "Demystify"

From COPTR
Jump to navigation Jump to search
m (Fixup date)
m (Modifies infobox)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{Infobox tool
 
{{Infobox tool
|purpose=Format Identification Analysis and Reporting
+
|purpose=Format Identification, Analysis and Reporting
 
|homepage=https://github.com/exponential-decay/demystify
 
|homepage=https://github.com/exponential-decay/demystify
 
|license=Open source (see URL above)
 
|license=Open source (see URL above)
|platforms=sqlite + Python + text/html
+
|platforms=WebAssembly + sqlite + Python + text/html
 
|function=Metadata Extraction, Content Profiling, De-Duplication
 
|function=Metadata Extraction, Content Profiling, De-Duplication
 
}}
 
}}
Line 11: Line 11:
 
Now known as "Demystify" (formerly 'DROID Siegfried Sqlite Analysis Engine') with thanks to Joshua Ng for the suggestion to rename it. Demystify is an engine for the analysis of [https://github.com/digital-preservation/droid DROID] CSV export files, [https://github.com/richardlehane/siegfried Siegfried] YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
 
Now known as "Demystify" (formerly 'DROID Siegfried Sqlite Analysis Engine') with thanks to Joshua Ng for the suggestion to rename it. Demystify is an engine for the analysis of [https://github.com/digital-preservation/droid DROID] CSV export files, [https://github.com/richardlehane/siegfried Siegfried] YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.
  
The tool provides archivist definitions for each of the sections output; these definitions are customizable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a blacklist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
+
The tool provides archivist definitions for each of the sections output; these definitions are customizable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a denylist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
  
 
The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with [http://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html rsync] to isolate these sets from one-another to be more flexible to work with.  
 
The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with [http://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html rsync] to isolate these sets from one-another to be more flexible to work with.  
Line 18: Line 18:
  
 
[https://ross-spencer.github.io/demystify-lite/ Demystify Lite] provides a Pyscript/WASM implementation of Demystify's features and runs completely browser side for users with DROID or Siegfried reports that they would like to see analyzed.
 
[https://ross-spencer.github.io/demystify-lite/ Demystify Lite] provides a Pyscript/WASM implementation of Demystify's features and runs completely browser side for users with DROID or Siegfried reports that they would like to see analyzed.
 +
 +
=== Siegfried integration ===
 +
 +
Demystify lite integrates a WASM build of Siegfried enabling client-side identification of your digital records and analysis within one secure tool, for more information, see Ross Spencer's [https://exponentialdecay.co.uk/blog/client-side-identification-and-reporting-pipeline-with-siegfried-and-demystify-lite/ brief integration report].
 +
 +
=== Fractal in Detail ===
 +
 +
[https://journal.code4lib.org/articles/16351 Fractal in Detail] in the Code4Lib Journal examines some of the motivations behind Demystify, generalizing its context and considering its source data alongside similar tools like Brunnhilde, Freud, and FileDriller.
  
 
== User Experiences ==
 
== User Experiences ==

Latest revision as of 15:50, 28 November 2025




Format Identification, Analysis and Reporting
Homepage:https://github.com/exponential-decay/demystify
License:Open source (see URL above)
Platforms:WebAssembly + sqlite + Python + text/html
Function:Metadata Extraction,Content Profiling,De-Duplication



Description

Now known as "Demystify" (formerly 'DROID Siegfried Sqlite Analysis Engine') with thanks to Joshua Ng for the suggestion to rename it. Demystify is an engine for the analysis of DROID CSV export files, Siegfried YAML export files, and Siegfried 'DROID compatible' output. The tool has three purposes, break the exports into their components and store them within a table in a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions.

The tool provides archivist definitions for each of the sections output; these definitions are customizable. The tool also supports output of statistics about files that may require further triage or may not be appropriate for long-term preservation based on institutional rules, in the form of a denylist. The tool also analyses file names and directory names for non-ascii characters, and also characteristics that may present problems cross-file-system based on known Microsoft rules: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

The engine can be used to generate a list of file paths for files that may present digital preservation risks (Rogues) or files which on the surface i.e. via identification alone, look okay (Heroes) and these listings can be used in conjunction with rsync to isolate these sets from one-another to be more flexible to work with.

Demystify Lite

Demystify Lite provides a Pyscript/WASM implementation of Demystify's features and runs completely browser side for users with DROID or Siegfried reports that they would like to see analyzed.

Siegfried integration

Demystify lite integrates a WASM build of Siegfried enabling client-side identification of your digital records and analysis within one secure tool, for more information, see Ross Spencer's brief integration report.

Fractal in Detail

Fractal in Detail in the Code4Lib Journal examines some of the motivations behind Demystify, generalizing its context and considering its source data alongside similar tools like Brunnhilde, Freud, and FileDriller.

User Experiences

Development Activity

All development activity is visible on GitHub: http://github.com/ross-spencer/demystify/commits

Release Feed

Below the last 3 release feeds:

2025-12-17 20:32:22
[tag:github.com,2008:Repository/15066530/3.0.0-rc.4 3.0.0-rc.4]
by ross-spencer
2025-12-17 19:43:13
[tag:github.com,2008:Repository/15066530/3.0.0-rc.3 3.0.0-rc.3]
by ross-spencer
2025-12-17 19:39:38
[tag:github.com,2008:Repository/15066530/2.1.0 2.1.0]
by ross-spencer

Activity Feed

Below the last 5 commits:

2025-12-17 19:30:15
[tag:github.com,2008:Grit::Commit/5808ca753a2b6cd504b54469b89583ba4dfaa9da Correct rogues logic]
by ross-spencer https://github.com/ross-spencer
2025-12-17 18:30:26
[tag:github.com,2008:Grit::Commit/fa99f8ef0f2e629d104e6568eb4944f3e9165b82 Remove lxml]
by ross-spencer https://github.com/ross-spencer
2025-12-17 18:29:56
[tag:github.com,2008:Grit::Commit/39287330b07ba573d6d94ace7f052f4e0b825d5d Revert "Update sqlitefid and pathlesstaken"]
by ross-spencer https://github.com/ross-spencer
2025-12-17 18:29:29
[tag:github.com,2008:Grit::Commit/fb2f415b0109d5a4165592ce2eccd656357aae73 Revert sqlitefid and pathlesstaken]
by ross-spencer https://github.com/ross-spencer
2025-12-17 18:18:51
[tag:github.com,2008:Grit::Commit/34b2f119a2eb11337d0bab54581069f70095838e Update workflow]
by ross-spencer https://github.com/ross-spencer