NOTE: project decomissioned and the git repository is no longer public, for more information feel free to reach out to Gervaise Henry
Manage a small agile team to create a proof of concept AWS architecture to run Nextflow pipelines in the cloud. It utilizes low-cost, highly-availability queueing, compute, and storage resources.
Role:
- AWS Solutions Architect
- Designed the entire architecture
- Mentored colud inexpereinced engineering team on AWS resources and interactions
- SCRUM Team Product Owner
- Prioritized the backlog
- Feature planning and primary contributor to backlog
- Iteration execution
- Engineer
- Participated in sprints
- Nextflow pipeline development
- AWS deployment
Phase 1 (complete):
- Queues submissions received from an API
- On a schedule starts compute workers (on-demand) run Nextflow pipelines with submitted parameters
- Run pipeline processes on AWS Batch (spot-instances) under Nextflow orchestration
- Store status updates in a non-SQL database (deposited through API and Lambda Function)
- Store run output metadata in a non-SQL database
- Store output data to a S3 bucket
- Store process temporary files to a S3 bucket with lifecycle set up
- Build a front-end website (serverlessly hosted) which can query and display status and metadata tables
Phase 2 (in progress: ON HOLD):
- Build a front-end website which can offer a user-interface for submissions
- Develop JSON-schema for templating submissions
- Customize compute worker to handle multiple pipeline types
- Handle file uploads for pipeline input parameters
- Create CloudFormation template for deployment
Phase 3 (in planning):
- Build Alexa skills for:
- Querying statuses
- Start batch run
- Querying metadata outputs