Skip to content

feat: formation energy NB - first implementation#333

Open
VsevolodX wants to merge 1 commit into
mainfrom
feature/SOF-7918
Open

feat: formation energy NB - first implementation#333
VsevolodX wants to merge 1 commit into
mainfrom
feature/SOF-7918

Conversation

@VsevolodX

@VsevolodX VsevolodX commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • Documentation
    • Updated introduction guide with a direct link to the Formation Energy workflow reference.
    • Added a new Formation Energy workflow Jupyter notebook providing step-by-step guidance for running formation energy calculations with materials.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds formation_energy.ipynb, a new Jupyter notebook that orchestrates an end-to-end Mat3ra/DFT formation energy calculation: authentication, compound material loading, elemental reference validation, workflow configuration with optional k-grid editing, job submission and polling, and result visualization. Updates Introduction.ipynb to replace a placeholder with a link to the new notebook.

Changes

Formation Energy Workflow Notebook

Layer / File(s) Summary
Notebook docs, setup, and configurable parameters
other/materials_designer/workflows/formation_energy.ipynb
Top-level documentation markdown, package installation, and all user-configurable parameters including material input source, workflow search terms, compute/queue settings, polling interval, and job naming with timestamp.
Auth, material loading, and elemental reference validation
other/materials_designer/workflows/formation_energy.ipynb
Authenticates via OIDC_ACCESS_TOKEN, selects account and default project; loads compound material from local uploads, Standata, or platform with regex fallback; resolves Standata elemental reference materials; defines get_refined_elemental_total_energy_holders and raises RuntimeError if any elemental references or refined total_energy values are missing.
Save material, load and configure workflow
other/materials_designer/workflows/formation_energy.ipynb
Saves compound material to platform; loads formation-energy workflow from Standata; defines and applies apply_workflow_kgrids helper to optionally edit the pw_scf unit k-grid context; visualizes and saves the configured workflow to the user collection.
Compute config, job creation, submission, and results
other/materials_designer/workflows/formation_energy.ipynb
Selects compute cluster by optional hostname substring or first available; creates compute configuration; creates and submits the formation-energy job; asynchronously awaits completion; retrieves and visualizes formation_energy, formation_energy_contributions, and compound total energy properties; prints per-element refined elemental total energies.
Introduction.ipynb link update
other/materials_designer/workflows/Introduction.ipynb
Replaces the "(to be added)" placeholder for "6.6.1. Compound formation energy" with a concrete link to formation_energy.ipynb.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Notebook as formation_energy.ipynb
    participant APIClient as Mat3ra APIClient
    participant Standata

    rect rgba(70, 130, 180, 0.5)
        Note over User,Standata: Setup & Material Resolution
        User->>Notebook: configure parameters & run
        Notebook->>APIClient: authenticate(OIDC_ACCESS_TOKEN)
        APIClient-->>Notebook: account + project
        Notebook->>Standata: load compound material
        Standata-->>Notebook: material + element symbols
        Notebook->>Standata: resolve elemental reference materials
        Standata-->>Notebook: element_materials_by_symbol
        Notebook->>APIClient: get_refined_elemental_total_energy_holders()
        APIClient-->>Notebook: total_energy holders (or RuntimeError)
    end

    rect rgba(60, 179, 113, 0.5)
        Note over Notebook,Standata: Workflow Configuration
        Notebook->>APIClient: save compound material
        APIClient-->>Notebook: saved material ID
        Notebook->>Standata: load formation-energy workflow
        Standata-->>Notebook: workflow
        Notebook->>Notebook: apply_workflow_kgrids(workflow, scf_kgrid)
        Notebook->>APIClient: save workflow to collection
        APIClient-->>Notebook: saved workflow ID
    end

    rect rgba(210, 105, 30, 0.5)
        Note over Notebook,APIClient: Job Execution & Results
        Notebook->>APIClient: list clusters, create compute config
        Notebook->>APIClient: create job(material, workflow, compute)
        APIClient-->>Notebook: job ID
        Notebook->>APIClient: submit job
        APIClient-->>Notebook: await completion
        Notebook->>APIClient: retrieve formation_energy properties
        APIClient-->>Notebook: formation_energy, contributions, total_energy
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Exabyte-io/api-examples#274: Also modifies other/materials_designer/workflows/Introduction.ipynb to add notebook links in the same index structure, directly analogous to replacing the "to be added" placeholder in this PR.

Suggested reviewers

  • timurbazhirov

Poem

🐇 A notebook appears, fresh from the warren's den,
Formation energy computed with pen!
Elements resolved, the k-grid set just right,
Jobs queued and polled through the digital night.
The placeholder text has hopped out of sight,
A link now shines where "to be added" once hid —
Hip hop hooray for all that this notebook did! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: formation energy NB - first implementation' clearly and specifically describes the main change: adding a new formation energy Jupyter notebook implementation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/SOF-7918

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.17)
other/materials_designer/workflows/Introduction.ipynb

Unexpected end of JSON input

other/materials_designer/workflows/formation_energy.ipynb

Unexpected end of JSON input


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@other/materials_designer/workflows/formation_energy.ipynb`:
- Around line 516-525: Add validation checks after the cluster selection logic
to handle two error scenarios: first, verify that the clusters list is not empty
before attempting to access clusters[0] in the else branch; second, after the
cluster is selected (either from the conditional or from clusters[0]), verify
that the cluster variable is not None and raise a descriptive error if no
matching cluster was found or if the clusters list was empty. These checks
should occur before the Compute object is instantiated to provide clear error
messages and prevent unclear downstream failures.
- Around line 190-193: Add a guard clause after the `projects` list is retrieved
that checks if the list is empty before attempting to access `projects[0]`. If
the list is empty, raise or print an informative error message explaining that
no default project was found for the given account ID. If the list is not empty,
proceed with the existing logic that extracts the project_id and prints the
confirmation message using `projects[0]`.
- Around line 225-230: The code attempts to access material_matches[0] without
verifying that the platform search returned any results. Add a guard clause
after the client.materials.list() call to check if material_matches is empty and
raise an informative exception before attempting to access the first element.
Only proceed with Material.create(material_matches[0]) if the list contains at
least one result.
- Around line 554-562: The create_job function call is passing the unsaved
formation_workflow variable instead of the saved_formation_workflow variable
that was persisted to the platform. Replace the workflow parameter in the
create_job function call to use saved_formation_workflow instead of
formation_workflow to maintain consistency with the saved_material parameter and
align with the workflow that was confirmed to be saved earlier in the notebook.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc192513-b780-4b30-99aa-74621044b3ed

📥 Commits

Reviewing files that changed from the base of the PR and between a3a4ec2 and b679a74.

📒 Files selected for processing (2)
  • other/materials_designer/workflows/Introduction.ipynb
  • other/materials_designer/workflows/formation_energy.ipynb

Comment on lines +190 to +193
"source": [
"projects = client.projects.list({\"isDefault\": True, \"owner._id\": ACCOUNT_ID})\n",
"project_id = projects[0][\"_id\"]\n",
"print(f\"✅ Using project: {projects[0]['name']} ({project_id})\")\n"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add guard for empty projects list to avoid cryptic IndexError.

If no default project exists for the account, projects will be an empty list and projects[0] will crash with an unhelpful IndexError. Provide a clear error message instead.

🛡️ Proposed fix
 projects = client.projects.list({\"isDefault\": True, \"owner._id\": ACCOUNT_ID})
+if not projects:
+    raise RuntimeError(f\"No default project found for account {ACCOUNT_ID}. Please create a default project first.\")
 project_id = projects[0][\"_id\"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 190 -
193, Add a guard clause after the `projects` list is retrieved that checks if
the list is empty before attempting to access `projects[0]`. If the list is
empty, raise or print an informative error message explaining that no default
project was found for the given account ID. If the list is not empty, proceed
with the existing logic that extracts the project_id and prints the confirmation
message using `projects[0]`.

Comment on lines +225 to +230
" material_matches = client.materials.list({\n",
" \"name\": {\"$regex\": re.escape(MATERIAL_NAME), \"$options\": \"i\"},\n",
" })\n",
"\n",
" material = Material.create(material_matches[0])\n",
" print(f\"♻️ Loaded material from platform: {material_matches[0]['_id']}\")\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add guard for empty platform search results.

If the material is not found locally or in Standata, and the platform search returns no matches, material_matches[0] will raise an IndexError. Provide a clear error message instead.

🛡️ Proposed fix
         material_matches = client.materials.list({
             \"name\": {\"$regex\": re.escape(MATERIAL_NAME), \"$options\": \"i\"},
         })
-
+        if not material_matches:
+            raise RuntimeError(
+                f\"Material '{MATERIAL_NAME}' not found in uploads folder, Standata, or platform.\"
+            )
         material = Material.create(material_matches[0])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 225 -
230, The code attempts to access material_matches[0] without verifying that the
platform search returned any results. Add a guard clause after the
client.materials.list() call to check if material_matches is empty and raise an
informative exception before attempting to access the first element. Only
proceed with Material.create(material_matches[0]) if the list contains at least
one result.

Comment on lines +516 to +525
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"if CLUSTER_NAME:\n",
" cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n",
"else:\n",
" cluster = clusters[0]\n",
"\n",
"compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n",
"print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add validation for cluster selection to prevent downstream crashes.

Two crash scenarios exist:

  1. If clusters is empty (no clusters available), clusters[0] raises IndexError
  2. If CLUSTER_NAME is specified but matches no cluster, cluster will be None, causing Compute() to fail with an unclear error
🛡️ Proposed fix
+if not clusters:
+    raise RuntimeError(\"No compute clusters available. Please check your account permissions.\")
+
 if CLUSTER_NAME:
     cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)
+    if cluster is None:
+        raise RuntimeError(
+            f\"No cluster matching '{CLUSTER_NAME}' found. \"
+            f\"Available: {[c['hostname'] for c in clusters]}\"
+        )
 else:
     cluster = clusters[0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"if CLUSTER_NAME:\n",
" cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n",
"else:\n",
" cluster = clusters[0]\n",
"\n",
"compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n",
"print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n"
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"if not clusters:\n",
" raise RuntimeError(\"No compute clusters available. Please check your account permissions.\")\n",
"\n",
"if CLUSTER_NAME:\n",
" cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n",
" if cluster is None:\n",
" raise RuntimeError(\n",
" f\"No cluster matching '{CLUSTER_NAME}' found. \"\n",
" f\"Available: {[c['hostname'] for c in clusters]}\"\n",
" )\n",
"else:\n",
" cluster = clusters[0]\n",
"\n",
"compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n",
"print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 516 -
525, Add validation checks after the cluster selection logic to handle two error
scenarios: first, verify that the clusters list is not empty before attempting
to access clusters[0] in the else branch; second, after the cluster is selected
(either from the conditional or from clusters[0]), verify that the cluster
variable is not None and raise a descriptive error if no matching cluster was
found or if the clusters list was empty. These checks should occur before the
Compute object is instantiated to provide clear error messages and prevent
unclear downstream failures.

Comment on lines +554 to +562
"formation_job_response = create_job(\n",
" api_client=client,\n",
" materials=[saved_material],\n",
" workflow=formation_workflow,\n",
" project_id=project_id,\n",
" owner_id=ACCOUNT_ID,\n",
" prefix=formation_job_name,\n",
" compute=compute.to_dict(),\n",
")\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how create_job handles the workflow parameter - does it expect saved or unsaved?
ast-grep --pattern $'def create_job($$$, workflow$$$):
    $$$
'

# Also check other notebooks for the pattern
rg -n "create_job" --type-add 'nb:*.ipynb' --type nb -A5 | head -60

Repository: Exabyte-io/api-examples

Length of output: 6118


🏁 Script executed:

#!/bin/bash
# Check the formation_energy.ipynb to see exact variable names
cd /tmp && python3 <<'PY'
import json
with open('/repo/other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
    nb = json.load(f)
    # Find cell with create_job call
    for i, cell in enumerate(nb['cells']):
        if 'create_job' in str(cell):
            cell_source = ''.join(cell['source'])
            if 'formation_job_response' in cell_source:
                print(f"Cell {i}:")
                print(cell_source)
                print("\n---\n")
                break
PY

Repository: Exabyte-io/api-examples

Length of output: 262


🏁 Script executed:

#!/bin/bash
# Search for create_job function definition in the codebase
rg -n "def create_job" --type py -B2 -A10

Repository: Exabyte-io/api-examples

Length of output: 1085


🏁 Script executed:

#!/bin/bash
# Check what variables are created before create_job in formation_energy.ipynb
cd /tmp && python3 <<'PY'
import json
with open('/repo/other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
    nb = json.load(f)
    # Find cells related to workflow creation and job creation
    for i, cell in enumerate(nb['cells']):
        cell_source = ''.join(cell.get('source', []))
        if any(x in cell_source for x in ['saved_formation_workflow', 'formation_workflow', 'create_job']):
            print(f"=== Cell {i} ===")
            # Show line-by-line with context
            for j, line in enumerate(cell_source.split('\n')):
                if any(x in line for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow=']):
                    print(f"Line {j}: {line}")
PY

Repository: Exabyte-io/api-examples

Length of output: 262


🏁 Script executed:

#!/bin/bash
# Find the formation_energy.ipynb file
find . -name "formation_energy.ipynb" -type f

Repository: Exabyte-io/api-examples

Length of output: 127


🏁 Script executed:

#!/bin/bash
# Get more details on the create_job function signature
rg -n "def create_job" --type py -A20 | head -40

Repository: Exabyte-io/api-examples

Length of output: 1916


🏁 Script executed:

#!/bin/bash
# Read the formation_energy.ipynb file - extract the relevant cells
python3 <<'PY'
import json
with open('./other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
    nb = json.load(f)
    # Find cells with workflow or create_job references
    for i, cell in enumerate(nb['cells']):
        cell_source = ''.join(cell.get('source', []))
        if any(x in cell_source for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow =']):
            print(f"=== CELL {i} ===")
            lines = cell_source.split('\n')
            for j, line in enumerate(lines):
                if any(x in line for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow =']):
                    start = max(0, j - 2)
                    end = min(len(lines), j + 3)
                    for k in range(start, end):
                        marker = ">>> " if k == j else "    "
                        print(f"{marker}{k}: {lines[k]}")
                    print()
PY

Repository: Exabyte-io/api-examples

Length of output: 2837


🏁 Script executed:

#!/bin/bash
# Check what variables the other notebooks use - do they have saved_workflow?
python3 <<'PY'
import json
import os

notebooks = [
    './other/materials_designer/workflows/band_structure.ipynb',
    './other/materials_designer/workflows/band_structure_hse.ipynb',
    './other/materials_designer/workflows/total_energy_post_processing.ipynb',
]

for nb_path in notebooks:
    if not os.path.exists(nb_path):
        continue
    print(f"\n=== {nb_path.split('/')[-1]} ===")
    with open(nb_path, 'r') as f:
        nb = json.load(f)
        for i, cell in enumerate(nb['cells']):
            cell_source = ''.join(cell.get('source', []))
            if 'saved_workflow' in cell_source or ('workflow' in cell_source and 'create_job' in cell_source):
                lines = cell_source.split('\n')
                for j, line in enumerate(lines):
                    if 'saved_workflow' in line or 'workflow=' in line:
                        print(f"Cell {i}, Line {j}: {line.strip()}")
PY

Repository: Exabyte-io/api-examples

Length of output: 1150


🏁 Script executed:

#!/bin/bash
# Also check if create_job actually converts to dict somewhere
rg -A30 "def create_job" src/py/mat3ra/notebooks_utils/core/entity/job/api.py | head -50

Repository: Exabyte-io/api-examples

Length of output: 1025


Use saved_formation_workflow instead of formation_workflow.

Line 557 passes formation_workflow (unsaved, from cell 27) rather than saved_formation_workflow (saved in cell 29). The code prints saved_formation_workflow.id earlier, confirming it was persisted to the platform, but then uses the unsaved version for create_job. This contradicts the pattern used for materials (saved_material) and differs from other notebooks which don't have this dual-variable scenario. Pass the saved workflow version for consistency:

workflow=saved_formation_workflow,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 554 -
562, The create_job function call is passing the unsaved formation_workflow
variable instead of the saved_formation_workflow variable that was persisted to
the platform. Replace the workflow parameter in the create_job function call to
use saved_formation_workflow instead of formation_workflow to maintain
consistency with the saved_material parameter and align with the workflow that
was confirmed to be saved earlier in the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant