Ingress, customValues, mount etc.

Table of Contents:

Project Name: Airflow Helm Chart Config

Date: 2024-02-14 09:20

Tag: #airflowhelm


Contents:

  • [b] REF

    [[2024-02-13-Add-Helm-Repository-Airflow#추가적으로 values 값을 변경 해야 하는 것 LIST]]

  • [!] version

    apache-airflow 2.8.1 apache-airflow-providers-apache-kafka 1.3.1 confluent-kafka 2.3.0 kafka-python 2.0.2 beautifulsoup4 4.12.3 pandas 2.2.0 selenium 4.17.2 lxml 5.1.0 requests 2.31.0 virtualenv 20.25.0

airflow에 필요한 도커 파일을 만든다. 도커 파일은 Airflow 운영에 필요한 PyPI 패키지 설치가 포함 되어 있다.

또한 Ingress를 적용하여 Container Platform의 Portal에 연동 되도록 수정 하였다.

Install PyPI package

FROM apache/airflow:2.8.1

# Install pip packages for airflow example
RUN pip install --no-cache-dir \
    pandas~=2.2.0 \
    requests~=2.31.0 \
    selenium~=4.17.2 \
    beautifulsoup4~=4.12.3 \
    lxml~=5.1.0
# Install pip packages for kafka
    kafka-python~=2.0.2 \
    apache-airflow-providers-apache-kafka~=1.3.1 \
    confluent-kafka~=2.3.0
FROM apache/airflow:2.8.1

# Install pip packages for airflow example and kafka
RUN pip install --no-cache-dir \
    pandas~=2.2.0 \
    requests~=2.31.0 \
    selenium~=4.17.2 \
    beautifulsoup4~=4.12.3 \
    lxml~=5.1.0 \
    kafka-python~=2.0.2 \
    apache-airflow-providers-apache-kafka~=1.3.1 \
    confluent-kafka~=2.3.0

imagePullBackError


Dockerfile FROM apache/airflow:2.8.1 Define environment variables ENV AIRFLOW_VERSION=2.8.1 ENV PYTHON_VERSION="$(python –version | cut -d " " -f 2 | cut -d "." -f 1-2)" ENV CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" RUN apt-get update Install Apache Airflow and additional dependencies RUN pip install –no-cache-dir \ pandas~=2.2.0 \ requests~=2.31.0 \ selenium~=4.17.2 \ beautifulsoup4~=4.12.3 \ lxml~=5.1.0 \ kafka-python~=2.0.2 \ apache-airflow-providers-apache-kafka~=1.3.1 \ confluent-kafka~=2.3.0 \ "apache-airflow==${AIRFLOW_VERSION}" –constraint "${CONSTRAINT_URL}"


requirement 파일 생성 후 Build

  • [!] requirement.txt file

    https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt

FROM apache/airflow:2.8.1

COPY requirements.txt .

RUN pip install -r requirements.txt

Command ‘krb5-config –libs gssapi’ returned non-zero exit status

FROM apache/airflow:2.8.1-python3.10
ADD requirements.txt .
USER root
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         gcc \
         heimdal-dev \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install -r requirements.txt
RUN pip uninstall -y argparse
sudo podman build --tag harbor.133.186.240.216.nip.io/cp-portal-repository/cp-airflow:v1 .

sudo podman push harbor.133.186.240.216.nip.io/cp-portal-repository/cp-airflow:v1
pip package 삭제
pip uninstall -r requirements.txt -y
upgrade 없이 설치
pip install --upgrade --no-deps --force-reinstall -r requirements.txt

pkg-config not found .. mysqlclient

FROM apache/airflow:2.8.1-python3.10
ADD requirements.txt .
USER root
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         gcc \
         heimdal-dev \
         default-libmysqlclient-dev \
         pkg-config
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install -r requirements.txt
RUN pip uninstall -y argparse

  • cp-airflow.yaml
    images:
    airflow:
      repository: localhost:5000/container-platform/$REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/airflow
      tag: "v1"
    
  • deploy-cp-portal.sh
    cd ../airflow
    sudo podman build --tag airflow $REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/cp-airflow:v1 .
    sudo podman push $REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/cp-airflow:v1
    cd ../script
    

ㅠㅠ 모르ㅔㄱ썽;;

  • requirements.txt
pandas~=2.2.0
requests~=2.31.0
selenium~=4.17.2
beautifulsoup4~=4.12.3
lxml~=5.1.0
kafka-python~=2.0.2
apache-airflow-providers-apache-kafka~=1.3.1
confluent-kafka~=2.3.0
helm cm-push --username admin --password Harbor12345 ../charts/cp-airflow.tgz cp-portal-repository
helm install -f ../values/cp-airflow.yaml cp-airflow cp-portal-repository/cp-airflow -n airflow
ubuntu@bami1:~/workspace/container-platform/cp-portal-deployment/values$ helm install -f ../values/cp-airflow.yaml cp-airflow cp-portal-repository/airflow -n airflow

![[Pasted image 20240214173951.png]]

Enable Logcs

logs.persistence.enabled=True

  • cp-airflow.yaml
    logs:
    # Configuration for empty dir volume (if logs.persistence.enabl[ed == false)
    # emptyDirConfig:
    #   sizeLimit: 1Gi
    #   medium: Memory
    
    persistence:
      # Enable persistent volume for storing logs
      enabled: true
      # Volume size for logs
      size: 100Gi
      # Annotations for the logs PVC
      annotations: {}
      # If using a custom storageClass, pass name here
      storageClassName: "{AIRFLOW_STORAGECLASS}"
      ## the name of an existing PVC to use
      existingClaim:
    

Example Dags Load

  • cp-airflow.yaml ```yaml extraEnv: |
    • name: AIRFLOW__CORE__LOAD_EXAMPLES value: 'True'


## Mounting Dags w. Persistence Volume

- [b] REF
> [Management Dags](https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html)

cp-airflow.yaml
```yaml
dags:
  # Where dags volume will be mounted. Works for both persistence and gitSync.
  # If not specified, dags mount path will be set to $AIRFLOW_HOME/dags
  mountPath: ~
  persistence:
    # Annotations for dags PVC
    annotations: {}
    # Enable persistent volume for storing dags
    enabled: True
    # Volume size for dags
    size: 1Gi
    # If using a custom storageClass, pass name here
    storageClassName: "{AIRFLOW_STORAGECLASS}"
    # access mode of the persistent volume
    accessMode: ReadWriteOnce
    ## the name of an existing PVC to use
    existingClaim:
    ## optional subpath for dag volume mount
    subPath: ~
  gitSync:
    enabled: false

Project Name: Airflow Helm Chart Config

Date: 2024-02-14 09:20

Tag: #airflowhelm


Contents:

airflow에 필요한 도커 파일을 만든다. 도커 파일은 Airflow 운영에 필요한 PyPI 패키지 설치가 포함 되어 있다.

또한 Ingress를 적용하여 Container Platform의 Portal에 연동 되도록 수정 하였다.

Install PyPI package

FROM apache/airflow:2.8.1

# Install pip packages for airflow example
RUN pip install --no-cache-dir \
    pandas~=2.2.0 \
    requests~=2.31.0 \
    selenium~=4.17.2 \
    beautifulsoup4~=4.12.3 \
    lxml~=5.1.0
# Install pip packages for kafka
    kafka-python~=2.0.2 \
    apache-airflow-providers-apache-kafka~=1.3.1 \
    confluent-kafka~=2.3.0
FROM apache/airflow:2.8.1

# Install pip packages for airflow example and kafka
RUN pip install --no-cache-dir \
    pandas~=2.2.0 \
    requests~=2.31.0 \
    selenium~=4.17.2 \
    beautifulsoup4~=4.12.3 \
    lxml~=5.1.0 \
    kafka-python~=2.0.2 \
    apache-airflow-providers-apache-kafka~=1.3.1 \
    confluent-kafka~=2.3.0

imagePullBackError


Dockerfile FROM apache/airflow:2.8.1 Define environment variables ENV AIRFLOW_VERSION=2.8.1 ENV PYTHON_VERSION="$(python –version | cut -d " " -f 2 | cut -d "." -f 1-2)" ENV CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" RUN apt-get update Install Apache Airflow and additional dependencies RUN pip install –no-cache-dir \ pandas~=2.2.0 \ requests~=2.31.0 \ selenium~=4.17.2 \ beautifulsoup4~=4.12.3 \ lxml~=5.1.0 \ kafka-python~=2.0.2 \ apache-airflow-providers-apache-kafka~=1.3.1 \ confluent-kafka~=2.3.0 \ "apache-airflow==${AIRFLOW_VERSION}" –constraint "${CONSTRAINT_URL}"


requirement 파일 생성 후 Build

  • [!] requirement.txt file

    https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt

FROM apache/airflow:2.8.1

COPY requirements.txt .

RUN pip install -r requirements.txt

Command ‘krb5-config –libs gssapi’ returned non-zero exit status

FROM apache/airflow:2.8.1-python3.10
ADD requirements.txt .
USER root
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         gcc \
         heimdal-dev \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install -r requirements.txt
RUN pip uninstall -y argparse
sudo podman build --tag harbor.133.186.240.216.nip.io/cp-portal-repository/cp-airflow:v1 .

sudo podman push harbor.133.186.240.216.nip.io/cp-portal-repository/cp-airflow:v1
pip package 삭제
pip uninstall -r requirements.txt -y
upgrade 없이 설치
pip install --upgrade --no-deps --force-reinstall -r requirements.txt

pkg-config not found .. mysqlclient

FROM apache/airflow:2.8.1-python3.10
ADD requirements.txt .
USER root
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         gcc \
         heimdal-dev \
         default-libmysqlclient-dev \
         pkg-config
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install -r requirements.txt
RUN pip uninstall -y argparse

  • cp-airflow.yaml
    images:
    airflow:
      repository: localhost:5000/container-platform/$REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/airflow
      tag: "v1"
    
  • deploy-cp-portal.sh
    cd ../airflow
    sudo podman build --tag airflow $REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/cp-airflow:v1 .
    sudo podman push $REPOSITORY_HOST/$REPOSITORY_PROJECT_NAME/cp-airflow:v1
    cd ../script
    

ㅠㅠ 모르ㅔㄱ썽;;

  • requirements.txt
pandas~=2.2.0
requests~=2.31.0
selenium~=4.17.2
beautifulsoup4~=4.12.3
lxml~=5.1.0
kafka-python~=2.0.2
apache-airflow-providers-apache-kafka~=1.3.1
confluent-kafka~=2.3.0
helm cm-push --username admin --password Harbor12345 ../charts/cp-airflow.tgz cp-portal-repository
helm install -f ../values/cp-airflow.yaml cp-airflow cp-portal-repository/cp-airflow -n airflow
ubuntu@bami1:~/workspace/container-platform/cp-portal-deployment/values$ helm install -f ../values/cp-airflow.yaml cp-airflow cp-portal-repository/airflow -n airflow

!Pasted image 20240214173951.png

Enable Logcs

logs.persistence.enabled=True

  • cp-airflow.yaml
    logs:
    # Configuration for empty dir volume (if logs.persistence.enabl[ed == false)
    # emptyDirConfig:
    #   sizeLimit: 1Gi
    #   medium: Memory
    
    persistence:
      # Enable persistent volume for storing logs
      enabled: true
      # Volume size for logs
      size: 100Gi
      # Annotations for the logs PVC
      annotations: {}
      # If using a custom storageClass, pass name here
      storageClassName: "{AIRFLOW_STORAGECLASS}"
      ## the name of an existing PVC to use
      existingClaim:
    

Example Dags Load

  • cp-airflow.yaml ```yaml extraEnv: |
    • name: AIRFLOW__CORE__LOAD_EXAMPLES value: 'True'


## Mounting Dags w. Persistence Volume

- [b] REF
> [Management Dags](https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html)

cp-airflow.yaml
```yaml
dags:
  # Where dags volume will be mounted. Works for both persistence and gitSync.
  # If not specified, dags mount path will be set to $AIRFLOW_HOME/dags
  mountPath: ~
  persistence:
    # Annotations for dags PVC
    annotations: {}
    # Enable persistent volume for storing dags
    enabled: True
    # Volume size for dags
    size: 1Gi
    # If using a custom storageClass, pass name here
    storageClassName: "{AIRFLOW_STORAGECLASS}"
    # access mode of the persistent volume
    accessMode: ReadWriteOnce
    ## the name of an existing PVC to use
    existingClaim:
    ## optional subpath for dag volume mount
    subPath: ~
  gitSync:
    enabled: false

About Hallo. 안녕하세요! 정승혜 입니다. 개발 일지 뿐만 아니라 나의 관심 있는 모든 것을 담을거예요.