An Algebraic Approach for Data-Centric Scientific Workflows

[doi]

OGASAWARA, E. ; DIAS, J. ; OLIVEIRA, D. ; PORTO, F. ; VALDURIEZ, P. ; MATTOSO, M. . An Algebraic Approach for Data-Centric Scientific Workflows. Proceedings of the VLDB Endowment, v. 4(11), p. 1328-1339, 2011.
Keywords: scientific workflows; provenance; parallel execution

Abstract

Large-scale scientific experiments based on computer simulations are typically modeled as scientific workflows, which eases the chaining of different programs. These scientific workflows are defined, executed, and monitored by scientific workflow management systems (SWfMS). As these experiments manage large amounts of data, it becomes critical to execute them in high-performance computing environments, such as clusters, grids, and clouds. However, few SWfMS provide parallel support. The ones that do so are usually labor-intensive for workflow developers and have limited primitives to optimize workflow execution. To address these issues, we developed workflow algebra to specify and enable the optimization of parallel execution of scientific workflows. In this paper, we show how the workflow algebra is efficiently implemented in Chiron, an algebraic based parallel scientific workflow engine. Chiron has a unique native distributed provenance mechanism that enables runtime queries in a relational database. We developed two studies to evaluate the performance of our algebraic approach implemented in Chiron; the first study compares Chiron with different approaches, whereas the second one evaluates the scalability of Chiron. By analyzing the results, we conclude that Chiron is efficient in executing scientific workflows, with the benefits of declarative specification and runtime provenance support. Copyright © 2013 John Wiley & Sons, Ltd.