Hooks¶
Dataduct has some endpoints you can use to execute python scripts before and after certain events when using the CLI and library locally.
Available Hooks¶
activate_pipeline, which hooks onto theactivate_pipelinefunction indataduct.etl.etl_actions.connect_to_redshift, which hooks onto theredshift_connectionfunction indataduct.data_access.
Creating a hook¶
Dataduct tries to find available hooks by searching in the directory specified
by the HOOKS_BASE_PATH config variable in the etl section, matching them
by their filename. For example, a hook for the activate_pipeline
endpoint would saved as activate_pipeline.py in that directory.
Each hook has two endpoints: before_hook and after_hook. To implement
one of these endpoints, you declare them as functions inside the hook. You may
implement only one or both endpoints per hook.
before_hook is called before the hooked function is executed. The parameters
passed into the hooked function will also be passed to the before_hook.
The before_hook is designed to allow you to manipulate the arguments of
the hooked function before it is called. At the end of the before_hook,
return the args and kwargs of the hooked function as a tuple.
Example before_hook:
# hooked function signature:
# def example(arg_one, arg_two, arg_three='foo')
def before_hook(arg_one, arg_two, arg_three='foo'):
return [arg_one + 1, 'hello world'], {'arg_three': 'bar'}
after_hook is called after the hooked function is executed. The result of the
hooked function is passed into after_hook as a single parameter.
The after_hook is designed to allow you to access or manipulate the result of
the hooked function. At the end of the after_hook, return the (modified)
result of the hooked function.
Example after_hook:
# hooked function result: {'foo': 1, 'bar': 'two'}
def after_hook(result):
result['foo'] = 2
result['bar'] = result['bar'] + ' three'
return result