feat: formation energy NB - first implementation#333
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
📝 WalkthroughWalkthroughAdds ChangesFormation Energy Workflow Notebook
Sequence Diagram(s)sequenceDiagram
participant User
participant Notebook as formation_energy.ipynb
participant APIClient as Mat3ra APIClient
participant Standata
rect rgba(70, 130, 180, 0.5)
Note over User,Standata: Setup & Material Resolution
User->>Notebook: configure parameters & run
Notebook->>APIClient: authenticate(OIDC_ACCESS_TOKEN)
APIClient-->>Notebook: account + project
Notebook->>Standata: load compound material
Standata-->>Notebook: material + element symbols
Notebook->>Standata: resolve elemental reference materials
Standata-->>Notebook: element_materials_by_symbol
Notebook->>APIClient: get_refined_elemental_total_energy_holders()
APIClient-->>Notebook: total_energy holders (or RuntimeError)
end
rect rgba(60, 179, 113, 0.5)
Note over Notebook,Standata: Workflow Configuration
Notebook->>APIClient: save compound material
APIClient-->>Notebook: saved material ID
Notebook->>Standata: load formation-energy workflow
Standata-->>Notebook: workflow
Notebook->>Notebook: apply_workflow_kgrids(workflow, scf_kgrid)
Notebook->>APIClient: save workflow to collection
APIClient-->>Notebook: saved workflow ID
end
rect rgba(210, 105, 30, 0.5)
Note over Notebook,APIClient: Job Execution & Results
Notebook->>APIClient: list clusters, create compute config
Notebook->>APIClient: create job(material, workflow, compute)
APIClient-->>Notebook: job ID
Notebook->>APIClient: submit job
APIClient-->>Notebook: await completion
Notebook->>APIClient: retrieve formation_energy properties
APIClient-->>Notebook: formation_energy, contributions, total_energy
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.17)other/materials_designer/workflows/Introduction.ipynbUnexpected end of JSON input other/materials_designer/workflows/formation_energy.ipynbUnexpected end of JSON input Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@other/materials_designer/workflows/formation_energy.ipynb`:
- Around line 516-525: Add validation checks after the cluster selection logic
to handle two error scenarios: first, verify that the clusters list is not empty
before attempting to access clusters[0] in the else branch; second, after the
cluster is selected (either from the conditional or from clusters[0]), verify
that the cluster variable is not None and raise a descriptive error if no
matching cluster was found or if the clusters list was empty. These checks
should occur before the Compute object is instantiated to provide clear error
messages and prevent unclear downstream failures.
- Around line 190-193: Add a guard clause after the `projects` list is retrieved
that checks if the list is empty before attempting to access `projects[0]`. If
the list is empty, raise or print an informative error message explaining that
no default project was found for the given account ID. If the list is not empty,
proceed with the existing logic that extracts the project_id and prints the
confirmation message using `projects[0]`.
- Around line 225-230: The code attempts to access material_matches[0] without
verifying that the platform search returned any results. Add a guard clause
after the client.materials.list() call to check if material_matches is empty and
raise an informative exception before attempting to access the first element.
Only proceed with Material.create(material_matches[0]) if the list contains at
least one result.
- Around line 554-562: The create_job function call is passing the unsaved
formation_workflow variable instead of the saved_formation_workflow variable
that was persisted to the platform. Replace the workflow parameter in the
create_job function call to use saved_formation_workflow instead of
formation_workflow to maintain consistency with the saved_material parameter and
align with the workflow that was confirmed to be saved earlier in the notebook.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc192513-b780-4b30-99aa-74621044b3ed
📒 Files selected for processing (2)
other/materials_designer/workflows/Introduction.ipynbother/materials_designer/workflows/formation_energy.ipynb
| "source": [ | ||
| "projects = client.projects.list({\"isDefault\": True, \"owner._id\": ACCOUNT_ID})\n", | ||
| "project_id = projects[0][\"_id\"]\n", | ||
| "print(f\"✅ Using project: {projects[0]['name']} ({project_id})\")\n" |
There was a problem hiding this comment.
Add guard for empty projects list to avoid cryptic IndexError.
If no default project exists for the account, projects will be an empty list and projects[0] will crash with an unhelpful IndexError. Provide a clear error message instead.
🛡️ Proposed fix
projects = client.projects.list({\"isDefault\": True, \"owner._id\": ACCOUNT_ID})
+if not projects:
+ raise RuntimeError(f\"No default project found for account {ACCOUNT_ID}. Please create a default project first.\")
project_id = projects[0][\"_id\"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 190 -
193, Add a guard clause after the `projects` list is retrieved that checks if
the list is empty before attempting to access `projects[0]`. If the list is
empty, raise or print an informative error message explaining that no default
project was found for the given account ID. If the list is not empty, proceed
with the existing logic that extracts the project_id and prints the confirmation
message using `projects[0]`.
| " material_matches = client.materials.list({\n", | ||
| " \"name\": {\"$regex\": re.escape(MATERIAL_NAME), \"$options\": \"i\"},\n", | ||
| " })\n", | ||
| "\n", | ||
| " material = Material.create(material_matches[0])\n", | ||
| " print(f\"♻️ Loaded material from platform: {material_matches[0]['_id']}\")\n", |
There was a problem hiding this comment.
Add guard for empty platform search results.
If the material is not found locally or in Standata, and the platform search returns no matches, material_matches[0] will raise an IndexError. Provide a clear error message instead.
🛡️ Proposed fix
material_matches = client.materials.list({
\"name\": {\"$regex\": re.escape(MATERIAL_NAME), \"$options\": \"i\"},
})
-
+ if not material_matches:
+ raise RuntimeError(
+ f\"Material '{MATERIAL_NAME}' not found in uploads folder, Standata, or platform.\"
+ )
material = Material.create(material_matches[0])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 225 -
230, The code attempts to access material_matches[0] without verifying that the
platform search returned any results. Add a guard clause after the
client.materials.list() call to check if material_matches is empty and raise an
informative exception before attempting to access the first element. Only
proceed with Material.create(material_matches[0]) if the list contains at least
one result.
| "source": [ | ||
| "from mat3ra.ide.compute import Compute\n", | ||
| "\n", | ||
| "if CLUSTER_NAME:\n", | ||
| " cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n", | ||
| "else:\n", | ||
| " cluster = clusters[0]\n", | ||
| "\n", | ||
| "compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n", | ||
| "print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n" |
There was a problem hiding this comment.
Add validation for cluster selection to prevent downstream crashes.
Two crash scenarios exist:
- If
clustersis empty (no clusters available),clusters[0]raisesIndexError - If
CLUSTER_NAMEis specified but matches no cluster,clusterwill beNone, causingCompute()to fail with an unclear error
🛡️ Proposed fix
+if not clusters:
+ raise RuntimeError(\"No compute clusters available. Please check your account permissions.\")
+
if CLUSTER_NAME:
cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)
+ if cluster is None:
+ raise RuntimeError(
+ f\"No cluster matching '{CLUSTER_NAME}' found. \"
+ f\"Available: {[c['hostname'] for c in clusters]}\"
+ )
else:
cluster = clusters[0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "source": [ | |
| "from mat3ra.ide.compute import Compute\n", | |
| "\n", | |
| "if CLUSTER_NAME:\n", | |
| " cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n", | |
| "else:\n", | |
| " cluster = clusters[0]\n", | |
| "\n", | |
| "compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n", | |
| "print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n" | |
| "source": [ | |
| "from mat3ra.ide.compute import Compute\n", | |
| "\n", | |
| "if not clusters:\n", | |
| " raise RuntimeError(\"No compute clusters available. Please check your account permissions.\")\n", | |
| "\n", | |
| "if CLUSTER_NAME:\n", | |
| " cluster = next((c for c in clusters if CLUSTER_NAME in c[\"hostname\"]), None)\n", | |
| " if cluster is None:\n", | |
| " raise RuntimeError(\n", | |
| " f\"No cluster matching '{CLUSTER_NAME}' found. \"\n", | |
| " f\"Available: {[c['hostname'] for c in clusters]}\"\n", | |
| " )\n", | |
| "else:\n", | |
| " cluster = clusters[0]\n", | |
| "\n", | |
| "compute = Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN)\n", | |
| "print(f\"Using cluster: {compute.cluster.hostname}, queue: {QUEUE_NAME}, ppn: {PPN}\")\n" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 516 -
525, Add validation checks after the cluster selection logic to handle two error
scenarios: first, verify that the clusters list is not empty before attempting
to access clusters[0] in the else branch; second, after the cluster is selected
(either from the conditional or from clusters[0]), verify that the cluster
variable is not None and raise a descriptive error if no matching cluster was
found or if the clusters list was empty. These checks should occur before the
Compute object is instantiated to provide clear error messages and prevent
unclear downstream failures.
| "formation_job_response = create_job(\n", | ||
| " api_client=client,\n", | ||
| " materials=[saved_material],\n", | ||
| " workflow=formation_workflow,\n", | ||
| " project_id=project_id,\n", | ||
| " owner_id=ACCOUNT_ID,\n", | ||
| " prefix=formation_job_name,\n", | ||
| " compute=compute.to_dict(),\n", | ||
| ")\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how create_job handles the workflow parameter - does it expect saved or unsaved?
ast-grep --pattern $'def create_job($$$, workflow$$$):
$$$
'
# Also check other notebooks for the pattern
rg -n "create_job" --type-add 'nb:*.ipynb' --type nb -A5 | head -60Repository: Exabyte-io/api-examples
Length of output: 6118
🏁 Script executed:
#!/bin/bash
# Check the formation_energy.ipynb to see exact variable names
cd /tmp && python3 <<'PY'
import json
with open('/repo/other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
nb = json.load(f)
# Find cell with create_job call
for i, cell in enumerate(nb['cells']):
if 'create_job' in str(cell):
cell_source = ''.join(cell['source'])
if 'formation_job_response' in cell_source:
print(f"Cell {i}:")
print(cell_source)
print("\n---\n")
break
PYRepository: Exabyte-io/api-examples
Length of output: 262
🏁 Script executed:
#!/bin/bash
# Search for create_job function definition in the codebase
rg -n "def create_job" --type py -B2 -A10Repository: Exabyte-io/api-examples
Length of output: 1085
🏁 Script executed:
#!/bin/bash
# Check what variables are created before create_job in formation_energy.ipynb
cd /tmp && python3 <<'PY'
import json
with open('/repo/other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
nb = json.load(f)
# Find cells related to workflow creation and job creation
for i, cell in enumerate(nb['cells']):
cell_source = ''.join(cell.get('source', []))
if any(x in cell_source for x in ['saved_formation_workflow', 'formation_workflow', 'create_job']):
print(f"=== Cell {i} ===")
# Show line-by-line with context
for j, line in enumerate(cell_source.split('\n')):
if any(x in line for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow=']):
print(f"Line {j}: {line}")
PYRepository: Exabyte-io/api-examples
Length of output: 262
🏁 Script executed:
#!/bin/bash
# Find the formation_energy.ipynb file
find . -name "formation_energy.ipynb" -type fRepository: Exabyte-io/api-examples
Length of output: 127
🏁 Script executed:
#!/bin/bash
# Get more details on the create_job function signature
rg -n "def create_job" --type py -A20 | head -40Repository: Exabyte-io/api-examples
Length of output: 1916
🏁 Script executed:
#!/bin/bash
# Read the formation_energy.ipynb file - extract the relevant cells
python3 <<'PY'
import json
with open('./other/materials_designer/workflows/formation_energy.ipynb', 'r') as f:
nb = json.load(f)
# Find cells with workflow or create_job references
for i, cell in enumerate(nb['cells']):
cell_source = ''.join(cell.get('source', []))
if any(x in cell_source for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow =']):
print(f"=== CELL {i} ===")
lines = cell_source.split('\n')
for j, line in enumerate(lines):
if any(x in line for x in ['saved_formation_workflow', 'formation_workflow', 'create_job', 'workflow =']):
start = max(0, j - 2)
end = min(len(lines), j + 3)
for k in range(start, end):
marker = ">>> " if k == j else " "
print(f"{marker}{k}: {lines[k]}")
print()
PYRepository: Exabyte-io/api-examples
Length of output: 2837
🏁 Script executed:
#!/bin/bash
# Check what variables the other notebooks use - do they have saved_workflow?
python3 <<'PY'
import json
import os
notebooks = [
'./other/materials_designer/workflows/band_structure.ipynb',
'./other/materials_designer/workflows/band_structure_hse.ipynb',
'./other/materials_designer/workflows/total_energy_post_processing.ipynb',
]
for nb_path in notebooks:
if not os.path.exists(nb_path):
continue
print(f"\n=== {nb_path.split('/')[-1]} ===")
with open(nb_path, 'r') as f:
nb = json.load(f)
for i, cell in enumerate(nb['cells']):
cell_source = ''.join(cell.get('source', []))
if 'saved_workflow' in cell_source or ('workflow' in cell_source and 'create_job' in cell_source):
lines = cell_source.split('\n')
for j, line in enumerate(lines):
if 'saved_workflow' in line or 'workflow=' in line:
print(f"Cell {i}, Line {j}: {line.strip()}")
PYRepository: Exabyte-io/api-examples
Length of output: 1150
🏁 Script executed:
#!/bin/bash
# Also check if create_job actually converts to dict somewhere
rg -A30 "def create_job" src/py/mat3ra/notebooks_utils/core/entity/job/api.py | head -50Repository: Exabyte-io/api-examples
Length of output: 1025
Use saved_formation_workflow instead of formation_workflow.
Line 557 passes formation_workflow (unsaved, from cell 27) rather than saved_formation_workflow (saved in cell 29). The code prints saved_formation_workflow.id earlier, confirming it was persisted to the platform, but then uses the unsaved version for create_job. This contradicts the pattern used for materials (saved_material) and differs from other notebooks which don't have this dual-variable scenario. Pass the saved workflow version for consistency:
workflow=saved_formation_workflow,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/materials_designer/workflows/formation_energy.ipynb` around lines 554 -
562, The create_job function call is passing the unsaved formation_workflow
variable instead of the saved_formation_workflow variable that was persisted to
the platform. Replace the workflow parameter in the create_job function call to
use saved_formation_workflow instead of formation_workflow to maintain
consistency with the saved_material parameter and align with the workflow that
was confirmed to be saved earlier in the notebook.
Summary by CodeRabbit