Data-flow analysis pipeline

Researcher:

Categories:

Information and Computer Science

The Technology

Traceability, reproducibility, transparency and efficiency are becoming increasingly important in the data analysis pipelines underlying today’s data-rich biomedical research. These analysis pipelines are typically composed of multiple steps, where raw data is hierarchically transformed into simpler, and gradually more insightful, data forms. Notably, these pipelines are rarely fully automated; they require multiple human decisions, interventions and informed parameter choices along each of their many steps (choosing thresholds, excluding data points, cleaning artifacts, etc.). However, to date, there is no simple way to incorporate, follow, document and expose such manual decisions. Furthermore, once parameters are changed, it is difficult to know which downstream calculations are affected and what specific parts of the pipeline and data items must be re-calculated.
The present technology is a novel tool that brings interactivity, traceability, transparency and efficiency to high-level programming. It is a fully generic programming tool, aiming to extend high level programming scripts, which are automatically translated into a network of data-flow computational objects allowing forward and backward tracing between raw and processed data to allow tracing of data source and human interventions.

Advantages

  • Generic
  • Automatic tracking and documentation of human interventions

Applications and Opportunities

  • Scientific research
  • Biomedicine
  • Financial analysis tools
  • Other
arrow Business Development Contacts
Motti Koren
Director of Business Development, Life Sciences