Skip to content

Improving the Dockerfile#160

Merged
tmikula-dev merged 2 commits into
masterfrom
feature/154-improve-dockerfile
May 26, 2026
Merged

Improving the Dockerfile#160
tmikula-dev merged 2 commits into
masterfrom
feature/154-improve-dockerfile

Conversation

@tmikula-dev
Copy link
Copy Markdown
Collaborator

@tmikula-dev tmikula-dev commented May 25, 2026

Overview

This pull request updates the Python dependency management and CI workflow to better separate development dependencies from production dependencies. The main changes involve introducing a new requirements-dev.txt for development tools and updating both documentation and CI to use it. Additionally, the Docker build process is improved to use this separation and update some underlying libraries.

Release Notes

  • Dependency management split for local and deployed purposes
  • Improved Dockerfile to take the project requirements instead of hardcoded ones

Related

Closes #154

Summary by CodeRabbit

  • Documentation

    • Updated development setup to require Docker for local integration testing and clarified separate production vs. development dependency files.
  • Chores

    • Split development and production dependency lists; pinned dev tool versions.
    • Improved container build and dependency installation flow.
    • CI/workflow dependency detection and environment setup consolidated to a reusable internal setup.

Review Change Stack

@tmikula-dev tmikula-dev self-assigned this May 25, 2026
@tmikula-dev tmikula-dev added the enhancement New feature or request label May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Walkthrough

Splits runtime and development dependencies into requirements.txt and requirements-dev.txt; updates Dockerfile to install runtime deps from requirements.txt (upgrades librdkafka build); adds a composite CI action to install dev deps; rewires GitHub Actions to detect changes to any requirements*.txt and use the composite action for lint/test jobs; updates developer docs.

Changes

Dependency Separation and Alignment

Layer / File(s) Summary
Dependency file structure
requirements.txt, requirements-dev.txt
requirements.txt reduced to runtime-only packages; requirements-dev.txt now explicitly enumerates pinned development/test/lint dependencies.
Dockerfile production dependency installation
Dockerfile
Copies requirements.txt into the image and installs runtime deps via pip install -r ${LAMBDA_TASK_ROOT}/requirements.txt --no-binary confluent-kafka; updates OS packages and upgrades librdkafka build to v2.14.0.
CI workflow dependency installation and file change detection
.github/workflows/check_python.yml
PR detection now treats any requirements*.txt files as Python-related changes; Pylint, Black, Mypy, unit test, and integration test jobs now use ./.github/actions/setup-dev-python-env to set up Python and install dev dependencies.
Composite CI action: setup-dev-python-env
.github/actions/setup-dev-python-env/action.yml
New composite action pins actions/setup-python to Python 3.13 with pip cache and runs pip install -r requirements-dev.txt.
Developer setup documentation
DEVELOPER.md, .github/copilot-instructions.md
DEVELOPER.md adds Docker as a prerequisite for local integration testing and installs dev deps from requirements-dev.txt; copilot instructions note the two requirements files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • petr-pokorny-absa
  • oto-macenauer-absa

Poem

🐰 I hopped through files with glee and zest,
Split deps so production runs its best,
Dev tools tucked in their own neat nest,
Docker builds lean, CI checks addressed,
A happy rabbit signs off — rest! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Improving the Dockerfile' is vague and overly generic; it doesn't clearly convey the main scope of changes which involves refactoring dependency management across the entire project. Consider a more specific title that highlights the key change, such as 'Split Python dependencies into production and development requirements' or 'Refactor dependency management: separate requirements files and update Dockerfile.'
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description follows the template with all required sections (Overview, Release Notes, Related) completed and provides clear context about the changes.
Linked Issues check ✅ Passed The PR fulfills all coding requirements from issue #154: split requirements into production/dev files [#154], Dockerfile now uses requirements.txt [#154], and librdkafka version updated [#154].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #154 objectives: dependency file splitting, Dockerfile refactoring, CI workflow updates, and documentation updates are all within scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/154-improve-dockerfile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tmikula-dev
Copy link
Copy Markdown
Collaborator Author

Interesting fact, that the repo of librdkafka was transferred from Magnus Edenhill to Confluent Inc. in 2023. So the Dockerfile, that updated its version also had to change the repository to fetch the version from.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/check_python.yml:
- Around line 40-42: The CHANGED_FILES assignment uses gh api to list PR files
but only gets the first page; update the gh api call inside the CHANGED_FILES
assignment to enable pagination by adding the --paginate and --slurp flags so
the existing --jq '.[].filename | select(endswith(".py") or
(startswith("requirements") and endswith(".txt")))' runs over the combined JSON
result; ensure the flags are added to the gh api invocation that constructs
CHANGED_FILES so the jq expression still extracts filenames as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2e4c4a1-be3b-4dae-a994-9d2600760b9a

📥 Commits

Reviewing files that changed from the base of the PR and between c107a5d and 459e1e0.

📒 Files selected for processing (5)
  • .github/workflows/check_python.yml
  • DEVELOPER.md
  • Dockerfile
  • requirements-dev.txt
  • requirements.txt

Comment thread .github/workflows/check_python.yml
Comment thread Dockerfile
tar -xf v2.4.0.tar.gz && \
cd /tmp/env-install-workdir/confluent-kafka/confluent-kafka-python-2.4.0 && \
CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/opt" python setup.py install && \
pip install -r ${LAMBDA_TASK_ROOT}/requirements.txt --no-binary confluent-kafka && \
Copy link
Copy Markdown

@lsulak lsulak May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confluent-kafka - what is this about?

It's also part of the requirements.txt file so why it's needed like this here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand it correctly, the requirements.txt confluent-kafka pins its version. This --no-binary way of forcing source compilation against the system librdkafka, that is set above. That one includes the Kerberos support.

The PyPI has an issue, that it compiles without GSSAPI (would break SASL_SSL authentication).

So requirements holds the version and --no-binary says how pip builds the confluent-kafka dependency. I added a comment into the Dockerfile to be more clear about that step: 9fd16fc

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't know about any of this. Thanks!

Comment thread .github/workflows/check_python.yml Outdated
@tmikula-dev tmikula-dev requested a review from lsulak May 26, 2026 12:46
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile (1)

63-68: ⚡ Quick win

Drop package-manager and pip caches in this layer.

This will reduce final image size and better match the PR’s deployment-size objective.

♻️ Suggested change
-    pip install -r ${LAMBDA_TASK_ROOT}/requirements.txt --no-binary confluent-kafka && \
+    pip install --no-cache-dir -r ${LAMBDA_TASK_ROOT}/requirements.txt --no-binary confluent-kafka && \
   echo "##############" && \
   echo "### cleanup ###" && \
   echo "##############" && \
     cd /root && \
-    rm -rf /tmp/env-install-workdir
+    dnf clean all && \
+    rm -rf /var/cache/dnf /tmp/env-install-workdir
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` around lines 63 - 68, Add removal of package-manager and pip
caches in the same Dockerfile layer after the pip install and before removing
/tmp/env-install-workdir: delete apt/yum/apk caches and pip caches (for example
remove /var/lib/apt/lists/*, /var/cache/apt/* or /var/cache/yum/* or
/var/cache/apk/* as appropriate for the base image) and remove pip caches such
as /root/.cache/pip and /home/sbx_user/.cache/pip so the cache cleanup happens
in the same RUN layer as the pip install step shown in the Dockerfile snippet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Dockerfile`:
- Around line 63-68: Add removal of package-manager and pip caches in the same
Dockerfile layer after the pip install and before removing
/tmp/env-install-workdir: delete apt/yum/apk caches and pip caches (for example
remove /var/lib/apt/lists/*, /var/cache/apt/* or /var/cache/yum/* or
/var/cache/apk/* as appropriate for the base image) and remove pip caches such
as /root/.cache/pip and /home/sbx_user/.cache/pip so the cache cleanup happens
in the same RUN layer as the pip install step shown in the Dockerfile snippet.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1e778e82-6a75-41d2-8d73-507f3c824ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 459e1e0 and 9fd16fc.

📒 Files selected for processing (4)
  • .github/actions/setup-dev-python-env/action.yml
  • .github/copilot-instructions.md
  • .github/workflows/check_python.yml
  • Dockerfile
✅ Files skipped from review due to trivial changes (2)
  • .github/actions/setup-dev-python-env/action.yml
  • .github/copilot-instructions.md

@tmikula-dev tmikula-dev merged commit 2e0b918 into master May 26, 2026
16 of 18 checks passed
@tmikula-dev tmikula-dev deleted the feature/154-improve-dockerfile branch May 26, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Dockerfile: dependencies are installed within with hard-coded versions

2 participants