Difference between revisions of "EPADD"

	ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
Homepage:	https://library.stanford.edu/projects/epadd
License:	Apache 2.0
Platforms:	Java
Wikidata ID:	Q59652265
Input Formats:	MBOX
Output Formats:	PREMIS (Preservation Metadata Implementation Strategies), BagIt
Function:	Access,Appraisal,Content Profiling,Metadata Extraction,Metadata Processing
Content type:	Email
Appears in COW:	Appraise email and other large, unstructured text collections

Latest revision as of 15:26, 26 May 2026

Description

From the project Github page: "ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.

The software is comprised of four modules:

Appraisal: Allows donors, dealers, and curators to easily gather and review email archives prior to transferring those files to an archival repository.

Processing: Provides archivists with the means to arrange and describe email archives.

Discovery: Provides the tools for repositories to remotely share a redacted view of their email archives with users through a web server discovery environment. (Note that this module is downloaded separately).

Delivery: Enables archival repositories to provide moderated full-text access to unrestricted email archives within a reading room environment."

From the Project page:

ePADD Technical Information

ePADD is written in Java and Javascript and powered by Apache Tomcat (v7.0) using Java EE Servlet API (v3.x) and Java Mail (v1.4.2). Text and metadata extraction, indexing and retrieval is performed by Apache Lucene (v4.7) and Apache Tika (v1.8). Charting and visualization is supported using the D3-based reusable chart library (v0.4.10). Oracle's Java Application Bundler and Launch4J are used for packaging on Mac and Windows platforms respectively. Other Java libraries from Apache (Lang, commons, CLI, IO, logging, etc.) are also used. JSON formatting is performed with the libraries org.json and Gson.

ePADD has implemented its own natural language processing (NLP) toolkit which is used for named entity extraction, disambiguation and other tasks. This toolkit supplants the Apache OpenNLP used in earlier beta versions of the ePADD software. We continue to use Muse as an internal library within ePADD. However, the Apache OpenNLP proved insufficient for our needs (at least for name recognition), and after various rounds of customization, we built our own named entity recognizer. This toolkit uses external datasets such as Wikipedia/DBpedia, Freebase, Geonames, OCLC FAST and LC Subject Headings/LC Name Authority File.

The project is developed with IDEs like IntelliJ Idea and Eclipse, built with Apache Maven, Ant, and custom shell scripts, and tracked using Git for source control and issue tracking. The ePADD software client is browser-based and compatible with Chrome and Firefox. It is optimized for Windows 7 SP1/10, OSX 10.12/10.13, and Ubuntu 16.04 machines, using Java 8.

User Experiences

On migrating from different email formats before ingest to ePADD https://groups.google.com/forum/#!topic/digital-curation/srt-oIVwAGU
ePADD on Twitter
ePADD 6.0 beta released!
ePADD User Guide
ePADD Shared Discovery Module Website Collection Contributor Guide
Full list of presentations and publications

Development Activity

All development activity is visible on GitHub: http://github.com/ePADD/epadd/commits

Release Feed

Below the last 3 release feeds:

2026-06-03 19:42:51: [tag:github.com,2008:Repository/38125452/v11.1.3 v11.1.3]; by jfarwer
2026-05-15 13:32:00: [tag:github.com,2008:Repository/38125452/v11.1.2 v11.1.2]; by jfarwer
2026-05-13 11:50:48: [tag:github.com,2008:Repository/38125452/v11.1.2-alpha v11.1.2-alpha: Fixed issue where embedded emails and HTML email parts were missed]; by jfarwer

Activity Feed

Below the last 5 commits: Failed to load RSS feed from https://github.com/ePADD/epadd/commits/master.atom: There was a problem during the HTTP request: 404 Not Found

Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt6a37139c9d2342_59361405

@@ Line 1: / Line 1: @@
-<!-- Use the structure provided in this template, do not change it! -->
+{{Infobox tool
+|image=Epadd_logo_orig.png
-{{Infobox_tool
 |purpose=ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
-|image={{PAGENAMEE}}.png
 |homepage=https://library.stanford.edu/projects/epadd
 |license=Apache 2.0
 |platforms=Java
+|Wikidata ID=Q59652265
+|formats_in=MBOX
+|formats_out=PREMIS (Preservation Metadata Implementation Strategies), BagIt
+|function=Access, Appraisal, Content Profiling, Metadata Extraction, Metadata Processing
+|content=Email
 }}
-<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
+<!-- Use the structure provided in this template, do not change it! -->
-<!-- Add one or more categories to describe the function of the tool, such as:
-[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]
-Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) -->
-[[Category:Metadata Extraction]][[Category:Preservation System]][[Category:Metadata Processing]]
-<!-- Add relevant categories to describe the content type that the tool addresses, such as:
+<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
-[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]
-Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. -->
-[[Category:Email]]
 == Description ==
@@ Line 27: / Line 23: @@
 The software is comprised of four modules:
-Appraisal: Allows donors, dealers, and curators to easily gather and review email archives prior to transferring those files to an archival repository.
+* '''Appraisal:''' Allows donors, dealers, and curators to easily gather and review email archives prior to transferring those files to an archival repository.
-Processing: Provides archivists with the means to arrange and describe email archives.
+* '''Processing:''' Provides archivists with the means to arrange and describe email archives.
-Discovery: Provides the tools for repositories to remotely share a redacted view of their email archives with users through a web server discovery environment. (Note that this module is downloaded separately).
+* '''Discovery:''' Provides the tools for repositories to remotely share a redacted view of their email archives with users through a web server discovery environment. (Note that this module is downloaded separately).
-Delivery: Enables archival repositories to provide moderated full-text access to unrestricted email archives within a reading room environment."
+* '''Delivery:''' Enables archival repositories to provide moderated full-text access to unrestricted email archives within a reading room environment."
+From the [https://library.stanford.edu/projects/epadd/ Project page]:
+'''ePADD Technical Information'''
+ePADD is written in Java and Javascript and powered by Apache Tomcat (v7.0) using Java EE Servlet API (v3.x) and Java Mail (v1.4.2). Text and metadata extraction, indexing and retrieval is performed by Apache Lucene (v4.7) and Apache Tika (v1.8). Charting and visualization is supported using the D3-based reusable chart library (v0.4.10). Oracle's Java Application Bundler and Launch4J are used for packaging on Mac and Windows platforms respectively. Other Java libraries from Apache (Lang, commons, CLI, IO, logging, etc.) are also used. JSON formatting is performed with the libraries org.json and Gson.
+ePADD has implemented its own natural language processing (NLP) toolkit which is used for named entity extraction, disambiguation and other tasks. This toolkit supplants the Apache OpenNLP used in earlier beta versions of the ePADD software. We continue to use Muse as an internal library within ePADD. However, the Apache OpenNLP proved insufficient for our needs (at least for name recognition), and after various rounds of customization, we built our own named entity recognizer. This toolkit uses external datasets such as Wikipedia/DBpedia, Freebase, Geonames, OCLC FAST and LC Subject Headings/LC Name Authority File.
+The project is developed with IDEs like IntelliJ Idea and Eclipse, built with Apache Maven, Ant, and custom shell scripts, and tracked using Git for source control and issue tracking. The ePADD software client is browser-based and compatible with Chrome and Firefox. It is optimized for Windows 7 SP1/10,  OSX 10.12/10.13, and Ubuntu 16.04 machines, using Java 8.
 == User Experiences ==
 <!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. Use a bullet list. -->
-On migrating from different email formats before ingest to ePADD https://groups.google.com/forum/#!topic/digital-curation/srt-oIVwAGU
+* On migrating from different email formats before ingest to ePADD https://groups.google.com/forum/#!topic/digital-curation/srt-oIVwAGU
+* [https://twitter.com/e_padd?lang=en ePADD on Twitter]
+* [http://library.stanford.edu/blogs/special-collections-unbound/2018/07/epadd-60-beta-released ePADD 6.0 beta released!]
+* [https://docs.google.com/document/d/1CVIpWK5FNs5KWVHgvtWTa7u0tZjUrFrBHq6_6ZJVfEA ePADD User Guide]
+* [https://docs.google.com/document/d/10U9Hxh9MS9C9bS8M7uYuXk5m7EBFpgd0yiwCcOSM6D ePADD Shared Discovery Module Website Collection Contributor Guide]
+* [https://library.stanford.edu/projects/epadd/presentations-publications Full list of presentations and publications]
 == Development Activity ==
 <!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. -->
-<!-- Add the OpenHub.com ID for the tool, if known. -->
+All development activity is visible on GitHub: http://github.com/ePADD/epadd/commits
-{{Infobox_tool_details
-|releases_rss=
-|issues_rss=
+=== Release Feed ===
-|mailing_lists=
+Below the last 3 release feeds:
-|ohloh_id=
+<rss max=3>https://github.com/ePADD/epadd/releases.atom</rss>
+=== Activity Feed ===
+Below the last 5 commits:
+<rss max=5>https://github.com/ePADD/epadd/commits/master.atom</rss>
+{{Infobox tool details
+|ohloh_id=epadd
 }}