Kepler is a scientific workflow modelling and management system that enables users, regardless of programming experience, to set up data analysis pipelines. The software will assemble, excecute, and document the sequences of services and scripts that scientists working with large-scale data use to execute their research. The software itself includes a number of services, and stores workflows in an archival format for sharing and re-use.
The Kepler project, supported by the NSF-funded Kepler/CORE team.
Licensing and cost
BSD License – free.
Kepler 2.4 was released in April 2013. The Kepler software is developed and maintained by the cross-project Kepler collaboration, which is led by a team consisting of several of the key institutions that originated the project: UC Davis, UC Santa Barbara, and UC San Diego. Kepler Project has built an active open-source community for ongoing development.
Platform and interoperability
Kepler is a java-based application that is maintained for Windows, OSX, and Linux operating systems using Java 1.6 or greater. It requires 512MB of RAM (1 GB or more recommended), at least 300 MB of disk space, and at least a 2GHz CPU. To use Kepler's statistical functionality, the project recommends also installing the statistical computing language and environment R. Kepler is built upon the mature Ptolemy II framework. Kepler provides direct access to the EarthGrid.
A Kepler workflow is comprised of 'actors,' the configurable software components excecuting specific tasks, and 'directors,' which control the workflow excecution, communicating via 'relations' and 'ports.' The software can operate on data stored in a variety of formats, locally and over the internet, and can integrate disparate software components written in different programming languages. The software saves workflows and customised components in the Kepler archive format (KAR). Kepler ships with a searchable library containing over 350 ready-to-use processing components, including R and Matlab services, a WebService actor for accessing and executing WSDL-defined Web services, a ReadTable to access legacy data stored in Excel files, and an ExternalExecution actor to execute command line applications from within a workflow. Kurator, one of a number of add-on modules, provides specific support for data curation, helping users construct, schedule and manage data curation pipelines. The package integrates the public Google Cloud service, as well as a number of domain-specific services. A Reporting Suite provides the ability to create reports displaying workflow results, capture provenance of workflow execution, and manage workflow runs.
Documentation and user support
Kepler offers extensive documentation consisting of a Getting Started Guide, Actor Documentation and a User Manual. The site also includes a straightforward FAQ. The project provides an active mailing list and a support page.
Kepler workflows can be excecuted through either a GUI or command line. The software includes installers designed to work in Mac, Windows and Linux operating systems. A module manager enables users to fetch updates or new modules online.
Users should be familiar with scientific workflow configuration.
Kepler has support for data described by Ecological Metadata Language (EML), and data accessible using the DiGIR and OPeNDAP protocols. The software can be easily extended to support other data standards or access protocols.
Influence and take-up
Kepler is used in a number of projects, in disciplines ranging from wildlife management to environmental sensor data to computational science. A workshop highlighting recent applications of Kepler was held at the ICCS2012 conference in Omaha, Nebraska, June 4-6 2012.