Hooks¶
Dataduct has some endpoints you can use to execute python scripts before and after certain events when using the CLI and library locally.
Available Hooks¶
activate_pipeline
, which hooks onto theactivate_pipeline
function indataduct.etl.etl_actions
.connect_to_redshift
, which hooks onto theredshift_connection
function indataduct.data_access
.
Creating a hook¶
Dataduct tries to find available hooks by searching in the directory specified
by the HOOKS_BASE_PATH
config variable in the etl
section, matching them
by their filename. For example, a hook for the activate_pipeline
endpoint would saved as activate_pipeline.py
in that directory.
Each hook has two endpoints: before_hook
and after_hook
. To implement
one of these endpoints, you declare them as functions inside the hook. You may
implement only one or both endpoints per hook.
before_hook
is called before the hooked function is executed. The parameters
passed into the hooked function will also be passed to the before_hook
.
The before_hook
is designed to allow you to manipulate the arguments of
the hooked function before it is called. At the end of the before_hook
,
return the args
and kwargs
of the hooked function as a tuple.
Example before_hook
:
# hooked function signature:
# def example(arg_one, arg_two, arg_three='foo')
def before_hook(arg_one, arg_two, arg_three='foo'):
return [arg_one + 1, 'hello world'], {'arg_three': 'bar'}
after_hook
is called after the hooked function is executed. The result of the
hooked function is passed into after_hook
as a single parameter.
The after_hook
is designed to allow you to access or manipulate the result of
the hooked function. At the end of the after_hook
, return the (modified)
result of the hooked function.
Example after_hook
:
# hooked function result: {'foo': 1, 'bar': 'two'}
def after_hook(result):
result['foo'] = 2
result['bar'] = result['bar'] + ' three'
return result