Difference between revisions of "JHOVE2"

From COPTR
Jump to navigation Jump to search
(Trial import from script.)
 
(Added METS Category.)
(6 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{Infobox_tool
 
{{Infobox_tool
|purpose=JHOVE2 is open source software for format-aware characterization of digital objects.
+
|purpose=JHOVE2 allows data curators to characterise the digital objects in their repositories.
 
|image=
 
|image=
 
|homepage=http://jhove2.org/
 
|homepage=http://jhove2.org/
Line 11: Line 11:
 
[[Category:Metadata Extraction]]
 
[[Category:Metadata Extraction]]
 
[[Category:File Format Identification]]
 
[[Category:File Format Identification]]
 
+
[[Category:Encryption Detection]]
 +
[[Category:METS]]
  
 
= Description =
 
= Description =
Successor to JHOVE.  
+
[https://bitbucket.org/jhove2/main/wiki/Home JHOVE2] is a follow-on to the Harvard/JSTOR [[JHOVE (Harvard Object Validation Environment)| JHOVE]] project, with the similar purpose of allowing data curators to characterise the digital objects in their repositories.  Characterisation is comprised of four elements: first, identifying the object’s format; second, validating that the object conforms to its format’s technical norms; third, extracting technical metadata from the object; and fourth, assessing whether the object should be accepted into a repository, based on policies set by the curator.  
 
+
The software was designed to be able to integrate with other applications to enable easy incorporation into a repository’s Ingest workflow.
JHOVE2 is open source software for format-aware characterization of digital objects. JHOVE2 analyzes digital objects with these questions: ===
+
====Provider====
 
+
California Digital Library, Portico, and Stanford University, with funding from the National Digital Information Infrastructure and Preservation Program (NDIIPP). Version 2.1 also credits the Bibliothèque nationale de France and Netarkivet.
* What is it? (Identification) ===
+
====Licensing and cost====
 
+
Open Source [http://www.linfo.org/bsdlicense.html BSD license] – free.
* What about it? (Feature extraction) ===
+
====Development activity====
 
+
JHOVE2 version 2.1 was released in March 2013.
* What is it, really? (Validation) ===
+
Funding for the JHOVE2 project ended in 2011. The project partners committed to providing self-funded maintenance (but not further development effort) for three years. Their goal was to create an open-source community to guide and foster JHOVE2 technical development, and the involvement of Bibliothèque nationale de France and Netarkivet from version 2.1 signifies some success in this regard.
 
+
====Platform and interoperability====
* So what? (Assessment)
+
JHOVE2 is written in Java Standard Edition 6, and requires a Java 6 runtime environment.  If the user is hoping to use the SGML validation module, an OpenSP SGML parser is required.
 
+
Developers wishing to rebuild JHOVE2 from the provided source will need a full Java SE 6 development kit and the Apache Maven project tool.
e.g. links to AQuA/SCAPE/Hackathon issues that use the tool
+
====Functional notes====
 +
The JHOVE2 project came about as a response to perceived shortcomings in the [[JHOVE (Harvard Object Validation Environment)| JHOVE]] software. JHOVE2 separates identification from validation, allowing the software to identify objects even if they are not valid.  This also provides the opportunity to use the [http://www.dcc.ac.uk/resources/external/pronom PRONOM] registry in signature-based identification via integration with [[DROID_(Digital_Record_Object_Identification)|DROID]], creating the ability to identify many more format-types than those for which it has validation modules. Other improvements include the ability to characterize hierarchical digital objects such as directories, zip files and bit streams nested within files, and a design that allows easier integration with other applications.
 +
JHOVE2 has validation modules for the following format types: ICC color profile; SGML; Shapefile; TIFF (including TIFF/EP, TIFF-FX, TIFF/IT, Exif, GeoTIFF, DNG and RFC 1314); UTF-8 encoded text; WAVE (including Broadcast Wave); XML; ZIP; GZIP; ARC; WARC; and arbitrary bytestreams, filesets and directories. Modules for JPEG 2000 (JP2 and JPX profiles) and PDF (including PDF/X and PDF/A) were planned but have not been implemented yet.  
 +
For comparison, ICC, SGML, Shapefile, ZIP, GZIP, ARC and WARC are newly supported in JHOVE2; however, JHOVE supports AIFF, GIF, JPEG, JPEG2000 and PDF while JHOVE2 does not.  HTML is also not supported in JHOVE2, as it is in JHOVE, but since HTML can be expressed in terms of SGML or XML the functionality remains the same. 
 +
====Documentation and user support====
 +
JHOVE2’s website includes an informative FAQ introduction, as well as standard documentation such as a [https://bytebucket.org/jhove2/main/wiki/documents/JHOVE2-Users-Guide_20110222.pdf user guide] and [https://bytebucket.org/jhove2/main/wiki/documents/JHOVE2Programmer2-0-0.pdf programmer guide].  
 +
Primary user support is through the jhove2-techtalk-l listserv, which remains active as of June 2013.  In addition, the website includes an issue tracker displaying reported bugs and feature enhancement requests. 
 +
====Usability====
 +
JHOVE2 does not include a GUI, which will be challenging for many users.
 +
The default output (e. g. in xml, txt or json) is very talkative and can contain up to 3500 lines for one TIFF file.
 +
====Expertise required====
 +
Installation requires solid knowledge of command line interfaces and experience with manually editing configuration files. Creation of the assessment policies requires detailed knowledge of digital preservation standards and technologies.  
 +
====Standards compliance====
 +
JHOVE2 uses the PRONOM registry for file identification.  The software includes a stylesheet that can transform JHOVE2 outputs into the METS metadata standard.
 +
====Influence and take-up====
 +
As of July 2013, the website reports approximately 1000 downloads of version 2 and 200 of version 2.1.
  
 
= User Experiences =
 
= User Experiences =
 +
Please note that JHOVE2 cannot cope with any empty spaces in the command line. Therefor, JHOVE2 has to be stored in a folder which can be typed in without any empty space.
  
 +
As the output is extremely wordy and contains so much information that it is difficult to tell if a certain TIFF file is valid or not, it might be helpfull to configure the output options. This is possible in the sgml-file. It might proove to be difficult for the average non-SGML-expert to handle the file.
  
 
= Development Activity =
 
= Development Activity =
 
=== Activity Feed ===
 
=== Activity Feed ===
 
 
Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:
 
Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:
 
<rss max=7>http://bitbucket.org/jhove2/main/rss</rss>
 
<rss max=7>http://bitbucket.org/jhove2/main/rss</rss>
  
=== Release Feed ===
+
{{Infobox_tool_details
 +
|ohloh_id=JHOVE2
 +
}}

Revision as of 18:26, 2 July 2020

JHOVE2 allows data curators to characterise the digital objects in their repositories.
Homepage:http://jhove2.org/
License:JHOVE2 is made freely available under the termsof the BSD open source license for all project-developed code; some third-party libraries may be covered by other open source licences.

Description

JHOVE2 is a follow-on to the Harvard/JSTOR JHOVE project, with the similar purpose of allowing data curators to characterise the digital objects in their repositories.  Characterisation is comprised of four elements: first, identifying the object’s format; second, validating that the object conforms to its format’s technical norms; third, extracting technical metadata from the object; and fourth, assessing whether the object should be accepted into a repository, based on policies set by the curator.   The software was designed to be able to integrate with other applications to enable easy incorporation into a repository’s Ingest workflow.

Provider

California Digital Library, Portico, and Stanford University, with funding from the National Digital Information Infrastructure and Preservation Program (NDIIPP). Version 2.1 also credits the Bibliothèque nationale de France and Netarkivet.

Licensing and cost

Open Source BSD license – free.

Development activity

JHOVE2 version 2.1 was released in March 2013. Funding for the JHOVE2 project ended in 2011. The project partners committed to providing self-funded maintenance (but not further development effort) for three years. Their goal was to create an open-source community to guide and foster JHOVE2 technical development, and the involvement of Bibliothèque nationale de France and Netarkivet from version 2.1 signifies some success in this regard.

Platform and interoperability

JHOVE2 is written in Java Standard Edition 6, and requires a Java 6 runtime environment.  If the user is hoping to use the SGML validation module, an OpenSP SGML parser is required. Developers wishing to rebuild JHOVE2 from the provided source will need a full Java SE 6 development kit and the Apache Maven project tool.

Functional notes

The JHOVE2 project came about as a response to perceived shortcomings in the JHOVE software. JHOVE2 separates identification from validation, allowing the software to identify objects even if they are not valid.  This also provides the opportunity to use the PRONOM registry in signature-based identification via integration with DROID, creating the ability to identify many more format-types than those for which it has validation modules. Other improvements include the ability to characterize hierarchical digital objects such as directories, zip files and bit streams nested within files, and a design that allows easier integration with other applications. JHOVE2 has validation modules for the following format types: ICC color profile; SGML; Shapefile; TIFF (including TIFF/EP, TIFF-FX, TIFF/IT, Exif, GeoTIFF, DNG and RFC 1314); UTF-8 encoded text; WAVE (including Broadcast Wave); XML; ZIP; GZIP; ARC; WARC; and arbitrary bytestreams, filesets and directories. Modules for JPEG 2000 (JP2 and JPX profiles) and PDF (including PDF/X and PDF/A) were planned but have not been implemented yet.   For comparison, ICC, SGML, Shapefile, ZIP, GZIP, ARC and WARC are newly supported in JHOVE2; however, JHOVE supports AIFF, GIF, JPEG, JPEG2000 and PDF while JHOVE2 does not.  HTML is also not supported in JHOVE2, as it is in JHOVE, but since HTML can be expressed in terms of SGML or XML the functionality remains the same. 

Documentation and user support

JHOVE2’s website includes an informative FAQ introduction, as well as standard documentation such as a user guide and programmer guide.   Primary user support is through the jhove2-techtalk-l listserv, which remains active as of June 2013.  In addition, the website includes an issue tracker displaying reported bugs and feature enhancement requests. 

Usability

JHOVE2 does not include a GUI, which will be challenging for many users. The default output (e. g. in xml, txt or json) is very talkative and can contain up to 3500 lines for one TIFF file.

Expertise required

Installation requires solid knowledge of command line interfaces and experience with manually editing configuration files. Creation of the assessment policies requires detailed knowledge of digital preservation standards and technologies.  

Standards compliance

JHOVE2 uses the PRONOM registry for file identification.  The software includes a stylesheet that can transform JHOVE2 outputs into the METS metadata standard.

Influence and take-up

As of July 2013, the website reports approximately 1000 downloads of version 2 and 200 of version 2.1.

User Experiences

Please note that JHOVE2 cannot cope with any empty spaces in the command line. Therefor, JHOVE2 has to be stored in a folder which can be typed in without any empty space.

As the output is extremely wordy and contains so much information that it is difficult to tell if a certain TIFF file is valid or not, it might be helpfull to configure the output options. This is possible in the sgml-file. It might proove to be difficult for the average non-SGML-expert to handle the file.

Development Activity

Activity Feed

Link to any RSS feed that is updated when issue or code updates occur, if any, e.g: Failed to load RSS feed from http://bitbucket.org/jhove2/main/rss: There was a problem during the HTTP request: 404 Not Found

Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt661e5e8f904a79_97772278