Dataduct¶
Dataduct - DataPipeline for humans
Dataduct is a wrapper built on top of AWS Datapipeline which makes it easy to create ETL jobs. All jobs can be specified as a series of steps in a YAML file and would automatically be translated into datapipeline with appropriate pipeline objects.
Features include:
- Visualizing pipeline activities
- Extracting data from different sources such as RDS, S3, local files
- Transforming data using EC2 and EMR
- Loading data into redshift
- Transforming data inside redshift
- QA data between the source system and warehouse
It is easy to create custom steps to augment the DSL as per the requirements. As well as running a backfill with the command line interface.
Contents: