Refactor Airflow container architecture and environment configuration#8
Open
falric05 wants to merge 132 commits into
Open
Refactor Airflow container architecture and environment configuration#8falric05 wants to merge 132 commits into
falric05 wants to merge 132 commits into
Conversation
* Add setup script and update entrypoint for Airflow environment initialization * Updates documentation for creating .env file and removes setup script * Add an initialization DAG to run the initialization script every two hours and launch the main DAG * Add Dockerfiles and entrypoint scripts for Airflow and web GUI setup * Remove the 'heasarc' directory creation from the Dockerfiles and the web GUI entrypoint script to make the given folder be mounted as the 'heasarc' directory * Refactor file handling in DataPipeline to use shutil.move for better directory management * Add an interface to browse PDF files in DL0 directory * Update .gitignore to exclude all files in the given folder except explorer.js and index.html * Update .gitignore to exclude all files in the data directory except explorer.js and index.html * Update the initialization DAG to start immediately and improve task management * Remove unused timedelta import in cosipipe_cosipy.py * Update cosipipe_cosipy.py Removed commented line code. * Rename initialization DAG to 'cosipy_contactsimulator' for clarity * Update README.md to improve DAG build and testing instructions * Update UI text for clarity and consistency in explorer.js and index.html * Updates the instructions in README.md for building and running Docker on Mac and Linux, improving clarity and consistency. * Set the start date of the DAG 'cosipy_contactsimulator' to a specific time to avoid unexpected behavior
…n the airflow service, and updated the data path in the docker-compose
…itialize the pipeline data fetch
…nge user id of gamma
- Created DAG `cosiflow_alert_monitor` that periodically reads log file `data_pipeline.log` - Implemented `alert_manager` module with error parsing, YAML rules, deduplication and notification sending - Integration with Mailhog for local email sending testing - Added Airflow plugin to access Mailhog Web UI via "Develop tools" menu - Added support for SMTP environment variables via `.env`
… SMTP settings using MailHog
…tatements. Add new notify_email and removed central logging file
…structure - Changed data mount path in docker-compose.yaml to align with new directory structure. - Added Conda Terms of Service acceptance in Dockerfile for required channels. - Enhanced entrypoint script to export COSI directory structure environment variables and create necessary directories if not present. - Updated Python version comment in environment.yml for clarity.
…ad functionality - Introduced `heasarc_explorer_plugin` for browsing data files in a specified directory. - Implemented Flask routes for home, folder navigation, and file downloads. - Added a basic HTML template for the data explorer interface. - Created a view plugin for integration with Airflow's app builder. - Removed the obsolete `dl3_explorer_view_plugin` to streamline the codebase.
…lowing all file types - Updated `explorer_home` and `explorer_folder` functions to include `current_path` in the template context. - Modified file listing in `explorer_folder` to show all file types instead of just PDFs. - Added a visual element in `explorer.html` to display the current path for better user navigation.
… directory paths - Updated directory path definitions in `DataPipeline` to utilize environment variables for better flexibility. - Ensured the input directory is created if it doesn't exist and adjusted the inotify watch to monitor the base directory directly.
… COSI installation - Added `unzip` to the list of installed packages for the Airflow Docker environment. - Updated the COSI installation process to install `py7zr` and changed the git checkout to version `v0.3.x` for compatibility.
… and navigation improvements - Added tags for better organization in the `fail_task` DAG. - Enhanced `explorer.html` with additional CSS comments for clarity. - Improved navigation button descriptions and added comments for folder and file link functionalities.
- Introduced `cosipipe_tsmap__extpythonenv.py` for multi-task TS map computation with external Python environment. - Added `cosipipe_tsmap__singletask__extpythonenv.py` for optimized single-task execution of the TS map pipeline. - Created `cosipipe_tsmap_mulres.py` for multi-resolution TS map computation. - Implemented `cosipipe_tsmap.py` for standard TS map computation. - Developed supporting scripts for data preparation, binning, aggregation, and TS map computation. - Enhanced `tsmap_pipeline.py` to manage the entire TS map processing workflow, supporting both standard and multi-resolution modes. - Added detailed logging and error handling for improved pipeline robustness. - Included cleanup tasks for better resource management post-execution.
- Introduced `dag_parallel_test_1` and `dag_parallel_test_2` for parallel task execution. - Each DAG includes two BashOperator tasks that simulate a 60-second sleep. - Configured with a maximum of 2 active runs and a concurrency of 3 for testing parallelism.
…vices - Added a new volume mapping for the pipeline directory to the Airflow service configuration, enhancing the environment setup for pipeline execution.
- Introduced `cosipipe_lightcurve.py` DAG to automate the process of generating GRB light curves from newly arrived compressed folders. - Implemented a sequence of tasks including waiting for new archives, decompressing them, binning GRB sources and backgrounds, and plotting the light curve. - Created `cosipipe_lc_ops.py` with utility functions for archive decompression, input validation, and data binning, ensuring modularity and reusability. - Enhanced the pipeline's robustness with error handling and validation checks for required input files.
…vigation - Added a new route for file previewing, allowing users to view file contents directly in the interface. - Implemented security checks to ensure safe access to files and directories. - Enhanced the HTML template with a two-column layout for file listings and previews, improving user experience. - Included JavaScript functionality for single-click preview and double-click download actions on files. - Added error handling and user feedback for file access issues and loading states.
- Created a new example environment file to define essential Airflow environment variables. - Included settings for Airflow admin credentials, SMTP configuration, and COSI directory structure. - This file serves as a template for users to set up their local environment for the Airflow application.
- Modified the Dockerfile to set default values for user and group IDs as empty, enabling users to specify their own values during the build process. - This change enhances flexibility for user management within the Airflow environment.
- Introduced UID, GID, and DISPLAY variables to the .env.example file, allowing users to specify their own bootstrap ID settings for containerized environments. - This update enhances the configurability of the Airflow environment setup.
…apping - Added UID and GID environment variables for user-specific configurations in the Postgres, Airflow, and Mailhog services. - Enabled volume mapping for Postgres data to persist across container restarts. - Updated Airflow service to utilize an environment file for configuration, improving setup flexibility.
- Changed default values for MY_UID and MY_GID in the Dockerfile to specific integers (12050 and 10000, respectively). - This update ensures a consistent user and group configuration for the Airflow environment during the build process.
- Changed the SQLAlchemy connection string to specify the Postgres host and port explicitly, enhancing clarity and ensuring proper connectivity for the Airflow environment.
- Removed unnecessary entries from .gitignore for better clarity. - Introduced a new script, hot_load_module.sh, to facilitate module management in Airflow, allowing for installation, removal, and updates of modules with Docker integration.
- Introduced a comprehensive guide on creating, installing, updating, and removing modules in Cosiflow. - Detailed the standard directory structure for modules and provided examples, including a Dockerfile template. - Included instructions for using the hot_load_module.sh script for module management, enhancing user experience and clarity in module operations.
…ebugging - Enhanced ConditionalTriggerDagRunOperator to include a max_retrig_runs parameter, allowing users to limit the number of automatic retriggers. - Updated the execute method to check the retrig_run_count against max_retrig_runs before proceeding with execution. - Modified COSIDAG to accept and propagate max_retrig_runs, improving control over DAG execution behavior based on runtime configurations.
- Changed the URL prefix for the Mailhog blueprint from "/mailhog" to an empty string, allowing for direct access. - Updated the default Mailhog server URL in the MailhogView class to remove unnecessary quotes, ensuring correct URL formatting.
- Removed the `.env` file requirement, transitioning to direct configuration in `env/docker-compose.yaml`. - Updated README to reflect new setup instructions, emphasizing the use of `docker-compose.yaml` for environment variables. - Enhanced clarity on required variables with `# TOEDIT` comments for user customization. - Deleted the obsolete `.env.example` file and adjusted the `entrypoint-airflow.sh` script to accommodate the new configuration method.
- Added new command-line options to hot_load_module.sh for creating Python virtual environments and managing Docker images. - Implemented functions to parse YAML configuration files, allowing for dynamic loading of module settings. - Updated README.md to include detailed instructions on using configuration files for module installation, including examples and explanations of key fields. - Enhanced clarity on the installation process and custom paths, improving user guidance for module management.
…guidance - Added a Quick Start Overview section in README.md outlining the steps to set up and run the core Cosiflow environment and install pipeline modules. - Updated env/README.md with prerequisites for module management, emphasizing the need for a configured Cosiflow environment and running services. - Clarified instructions for using the hot_load_module.sh script to install modules, ensuring users understand the necessary context and directory structure.
…tion file and bootstrap.sh - Renamed `airflow.cfg` file with configuration settings for Airflow. - Removed the obsolete `airflow.cfg.SequentialExecutor` file to streamline configuration management. - Deleted the `bootstrap.sh` script as part of the cleanup process, ensuring a more focused setup for the environment.
…customization - Modified UID and GID defaults to be empty in `env/docker-compose.yaml`, allowing users to set their own values without predefined defaults. - Updated AIRFLOW_ADMIN_PASSWORD default to be empty, enhancing security by requiring explicit password configuration.
- Introduced a new plugin that adds a "Refresh DAGs List" option under the "Develop tools" menu in the Airflow UI. - Implemented functionality to execute the `airflow dags list` command and refresh the DAG bag, ensuring the UI reflects the latest DAGs. - Created a README.md file detailing the plugin's features, installation instructions, and usage steps for users.
- Added build arguments for UID and GID to allow customization during Docker image build. - Updated HOST_IP default value to a specific hostname for improved clarity. - Adjusted comments for environment variables to guide user configuration more effectively.
- Added a command-line option (-c) for specifying a custom configuration file, allowing users to override auto-detection. - Improved the load_yaml_config function to accept an optional config file path. - Updated usage instructions to reflect the new -c option and clarified the handling of paths. - Enhanced error handling for missing configuration files and added support for specifying Python versions and requirements without dependencies in environment creation.
- Introduced a new method to verify if files can be opened, ensuring they are not being written or locked by other processes. - Implemented a two-pass approach to first identify required files and then wait for them to be fully written before proceeding. - Enhanced logging to provide detailed feedback on file readiness status during the waiting period. - Pushed file paths to XCom only after confirming all files are ready, improving reliability in input handling.
- Added installation of Python 3.11 and its development packages. - Updated the default Python version to 3.11 in the Dockerfile. - Cleaned up installation commands for clarity and organization.
…dencies - Added HOST_WORKSPACE_PATH for better workspace path management. - Streamlined SMTP settings for Airflow with clearer variable definitions. - Introduced a volume for environment variables and improved service dependency checks for Mailhog. - Adjusted comments for clarity and user guidance on configuration options.
- Deleted multiple DAG files including cosidag_a.py, cosidag_b.py, cosidag_example.py, cosidag_helloworld.py, and others to clean up the repository. - Removed corresponding tutorial files and functions that were no longer needed, streamlining the codebase. - This cleanup enhances maintainability and reduces clutter in the project structure.
…d documentation - Introduced a new file-driven monitoring mode in COSIDAG, allowing the detection of new direct child files in addition to existing folder-driven functionality. - Updated README to clarify the behavior of `check_new_file` in both monitoring modes and added examples for better user guidance. - Enhanced variable tracking to include both folder and file paths, ensuring accurate processing history. - Added new helper functions for file stability checks and direct file iteration, improving the robustness of the monitoring process.
- Introduced a new plugin that adds an Airflow UI page under **Develop tools -> Reset Cosidag** for managing the processed state of COSIDAG pipelines. - Updated functionality to handle both folder and file paths, allowing users to reset and delete specific processed paths. - Enhanced the user interface to include checkboxes for selecting individual paths and a delete button for removing selected entries. - Improved documentation in README.md to provide clear usage instructions and feature descriptions.
…unctionality - Revised README.md to clarify COSIDAG customization and module structure, including updates to the tutorial section and DAG references. - Introduced the Data Explorer Plugin README.md, providing a UI for browsing files in the COSI data directory, with features for file navigation and metadata preview. - Added the MailHog Link Plugin README.md, offering a direct link to the MailHog web interface from the Airflow UI for easier access to email logs. - Updated module management documentation to reflect changes in module naming and structure, ensuring consistency across references.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Description
Overview
This PR finalizes the refactoring of the Docker-based architecture for Cosiflow, improving modularity, user configuration flexibility, and environment consistency.
The work consolidates the separation between Cosiflow orchestration and scientific runtimes, while enhancing the Airflow container setup and configuration management.
Main Changes
Container Architecture Improvements
Dockerfile.airflowfor cleaner and more flexible user configuration.docker-compose Enhancements
Refactored
docker-compose.yamlto:Disabled loading of example DAGs where not required.
Improved PostgreSQL and executor configuration consistency.
Airflow & Plugin Enhancements
Added Refresh DAGs List Plugin with UI integration.
Added and refined:
Improved module hot-loading management (
hot_load_module.sh).Cleaned and reorganized plugin structure.
Documentation Improvements
Enhanced
README.mdandenv/README.md.Added clearer instructions for:
Improved consistency across documentation.
Architectural Impact
This PR strengthens the container-based architecture introduced in:
Key goals achieved:
Clearer separation between:
Improved reproducibility
Cleaner environment configuration
Better support for modular pipelines
Why This Matters
This refactor: