Nextflow in the Cloud

External Link

NOTE: project decomissioned and the git repository is no longer public, for more information feel free to reach out to Gervaise Henry

Manage a small agile team to create a proof of concept AWS architecture to run Nextflow pipelines in the cloud. It utilizes low-cost, highly-availability queueing, compute, and storage resources.

Role:

  • AWS Solutions Architect
    • Designed the entire architecture
    • Mentored colud inexpereinced engineering team on AWS resources and interactions
  • SCRUM Team Product Owner
    • Prioritized the backlog
    • Feature planning and primary contributor to backlog
    • Iteration execution
  • Engineer
    • Participated in sprints
      • Nextflow pipeline development
      • AWS deployment

Phase 1 (complete):

  • Queues submissions received from an API
  • On a schedule starts compute workers (on-demand) run Nextflow pipelines with submitted parameters
  • Run pipeline processes on AWS Batch (spot-instances) under Nextflow orchestration
  • Store status updates in a non-SQL database (deposited through API and Lambda Function)
  • Store run output metadata in a non-SQL database
  • Store output data to a S3 bucket
  • Store process temporary files to a S3 bucket with lifecycle set up
  • Build a front-end website (serverlessly hosted) which can query and display status and metadata tables

Phase 2 (in progress: ON HOLD):

  • Build a front-end website which can offer a user-interface for submissions
  • Develop JSON-schema for templating submissions
  • Customize compute worker to handle multiple pipeline types
  • Handle file uploads for pipeline input parameters
  • Create CloudFormation template for deployment

Phase 3 (in planning):

  • Build Alexa skills for:
    • Querying statuses
    • Start batch run
    • Querying metadata outputs