Create and use Connections in Airflow operator at runtime [duplicate]

Connections come from the ORM

Yes, you can create connections at runtime, even at DAG creation time if you’re careful enough. Airflow is completely transparent on its internal models, so you can interact with the underlying SqlAlchemy directly. As exemplified originally in this answer, it’s as easy as:

from airflow.models import Connection
from airflow import settings

def create_conn(username, password, host=None):
    new_conn = Connection(conn_id=f'{username}_connection',
                                  login=username,
                                  host=host if host else None)
    new_conn.set_password(password)

    session = settings.Session()
    session.add(new_conn)
    session.commit()

Where you can, of course, interact with any other extra Connection properties you may require for the EMR connection.

Environment are process-bounded

This is not a limitation of Airflow or Python, but (AFAIK for every major OS) environments are bound to the lifetime of a process. When you export a variable in bash for example, you’re simply stating that when you spawn child processes, you want to copy that variable to the child’s environment. This means that the parent process can’t change the child’s environment after its creation and the child can’t change the parents environment.

In short, only the process itself can change its environment after it’s created. And considering that worker process are Airflow subprocesses, it’s hard to control the creation of their environments as well. What you can do is to write the environment variables into a file and intentionally update the current environment with overrides from that file on each task start.

Leave a Comment