-
Notifications
You must be signed in to change notification settings - Fork 7
feat(runs): add analysis preset harness #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
82fe167
144c978
c62684a
e1f4a92
fc9e822
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| description = "All grid-search evaluations, including optional external-tool checks." | ||
|
|
||
| [defaults] | ||
| GRID_SEARCH_RESULTS_DIR = "/data/results/grid_search_results" | ||
| GRID_SEARCH_INPUTS_DIR = "/data/inputs" | ||
| PROTEIN_CONFIGS_CSV = "${GRID_SEARCH_INPUTS_DIR}/protein_analysis_config.csv" | ||
| TARGET_FILENAME = "refined-patched.cif" | ||
| N_JOBS = "16" | ||
| PATCH_CIF_PATTERN = "refined.cif" | ||
| GRID_SEARCH_DEPTH = "4" | ||
| PATCH_DEPTH = "${GRID_SEARCH_DEPTH}" | ||
| PATCH_INPUT_PDB_PATTERN = "processed/{pdb_id}/{pdb_id}_single_001_density_input.cif" | ||
| PATCH_RCSB_PATTERN = "${GRID_SEARCH_RESULTS_DIR}/([A-Za-z0-9]{4})" | ||
|
|
||
| [shared_args] | ||
| grid-search-results-path = "${GRID_SEARCH_RESULTS_DIR}" | ||
| grid-search-inputs-path = "${GRID_SEARCH_INPUTS_DIR}" | ||
| protein-configs-csv = "${PROTEIN_CONFIGS_CSV}" | ||
| target-filename = "${TARGET_FILENAME}" | ||
| occupancies = [0.0, 0.25, 0.5, 0.75, 1.0] | ||
| n-jobs = "${N_JOBS}" | ||
| depth = "${GRID_SEARCH_DEPTH}" | ||
|
|
||
| [[pre_jobs]] | ||
| name = "patch_outputs" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/patch_output_cif_files.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/patch_outputs" | ||
| args = { input-dir = "${GRID_SEARCH_RESULTS_DIR}", cif-pattern = "${PATCH_CIF_PATTERN}", rcsb-pattern = "${PATCH_RCSB_PATTERN}", depth = "${PATCH_DEPTH}", grid-search-input-dir = "${GRID_SEARCH_INPUTS_DIR}", input-pdb-pattern = "${PATCH_INPUT_PDB_PATTERN}" } | ||
|
|
||
| [[jobs]] | ||
| name = "rscc" | ||
| env = "analysis" | ||
| gpu_count = 1 | ||
| script = "scripts/eval/rscc_grid_search_script.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/rscc" | ||
|
|
||
| [[jobs]] | ||
| name = "lddt" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/lddt_evaluation_script.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/lddt" | ||
|
|
||
| [[jobs]] | ||
| name = "bond_geometry" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/bond_geometry_eval.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/bond_geometry" | ||
|
|
||
| [[jobs]] | ||
| name = "tortoize" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/run_and_process_tortoize.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/tortoize" | ||
|
|
||
| [[jobs]] | ||
| name = "phenix_clashscore" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/run_and_process_phenix_clashscore.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/phenix_clashscore" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| description = "Classify altloc selections into side-chain, loop, and domain-shift categories." | ||
|
|
||
| [defaults] | ||
| ALTLOC_ANALYSIS_DIR = "/data/results/altloc_analysis" | ||
| ALTLOC_INPUTS_DIR = "/data/inputs" | ||
| ALTLOC_SELECTIONS_CSV = "${ALTLOC_ANALYSIS_DIR}/altloc_selections.csv" | ||
| ALTLOC_CLASSIFICATIONS_CSV = "${ALTLOC_ANALYSIS_DIR}/altloc_region_classifications.csv" | ||
| CIF_ROOT = "${ALTLOC_INPUTS_DIR}" | ||
| DOMAIN_SHIFT_MIN_SPAN = "50" | ||
| LOOP_LDDT_THRESHOLD = "0.75" | ||
|
|
||
| [shared_args] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this name, "shared_args" a little confusing. These are just the arguments to the script, right?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes — in this schema |
||
| input-csv = "${ALTLOC_SELECTIONS_CSV}" | ||
| cif-root = "${CIF_ROOT}" | ||
| output-file = "${ALTLOC_CLASSIFICATIONS_CSV}" | ||
| domain-shift-min-span = "${DOMAIN_SHIFT_MIN_SPAN}" | ||
| loop-lddt-threshold = "${LOOP_LDDT_THRESHOLD}" | ||
|
|
||
| [[jobs]] | ||
| name = "classify_altloc_regions" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/classify_altloc_regions.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/altloc_classify" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| description = "Build an analysis protein-config CSV by finding altloc selections in input CIFs." | ||
|
|
||
| [defaults] | ||
| ALTLOC_ANALYSIS_DIR = "/data/results/altloc_analysis" | ||
| ALTLOC_INPUTS_DIR = "/data/inputs" | ||
| PROTEINS_CSV = "${ALTLOC_INPUTS_DIR}/proteins.csv" | ||
| ALTLOC_SELECTIONS_CSV = "${ALTLOC_ANALYSIS_DIR}/altloc_selections.csv" | ||
| ALTLOC_MIN_SPAN = "5" | ||
| ALTLOC_LABEL = "label_alt_id" | ||
|
|
||
| [shared_args] | ||
| input-csv = "${PROTEINS_CSV}" | ||
| output-file = "${ALTLOC_SELECTIONS_CSV}" | ||
| min-span = "${ALTLOC_MIN_SPAN}" | ||
| altloc-label = "${ALTLOC_LABEL}" | ||
|
|
||
| [[jobs]] | ||
| name = "find_altloc_selections" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/find_altloc_selections.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/altloc_find" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| description = "Analyze grid-search outputs with RSCC, LDDT clustering, and bond geometry." | ||
|
|
||
| [defaults] | ||
| GRID_SEARCH_RESULTS_DIR = "/data/results/grid_search_results" | ||
| GRID_SEARCH_INPUTS_DIR = "/data/inputs" | ||
| PROTEIN_CONFIGS_CSV = "${GRID_SEARCH_INPUTS_DIR}/protein_analysis_config.csv" | ||
| TARGET_FILENAME = "refined-patched.cif" | ||
| N_JOBS = "16" | ||
| PATCH_CIF_PATTERN = "refined.cif" | ||
| GRID_SEARCH_DEPTH = "4" | ||
| PATCH_DEPTH = "${GRID_SEARCH_DEPTH}" | ||
| PATCH_INPUT_PDB_PATTERN = "processed/{pdb_id}/{pdb_id}_single_001_density_input.cif" | ||
| PATCH_RCSB_PATTERN = "${GRID_SEARCH_RESULTS_DIR}/([A-Za-z0-9]{4})" | ||
|
|
||
| [shared_args] | ||
| grid-search-results-path = "${GRID_SEARCH_RESULTS_DIR}" | ||
| grid-search-inputs-path = "${GRID_SEARCH_INPUTS_DIR}" | ||
| protein-configs-csv = "${PROTEIN_CONFIGS_CSV}" | ||
| target-filename = "${TARGET_FILENAME}" | ||
| occupancies = [0.0, 0.25, 0.5, 0.75, 1.0] | ||
| n-jobs = "${N_JOBS}" | ||
| depth = "${GRID_SEARCH_DEPTH}" | ||
|
|
||
| [[pre_jobs]] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto here, best if we can cache the results or optionally skip this step.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, same fix here: |
||
| name = "patch_outputs" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/patch_output_cif_files.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/patch_outputs" | ||
| args = { input-dir = "${GRID_SEARCH_RESULTS_DIR}", cif-pattern = "${PATCH_CIF_PATTERN}", rcsb-pattern = "${PATCH_RCSB_PATTERN}", depth = "${PATCH_DEPTH}", grid-search-input-dir = "${GRID_SEARCH_INPUTS_DIR}", input-pdb-pattern = "${PATCH_INPUT_PDB_PATTERN}" } | ||
|
|
||
| [[jobs]] | ||
| name = "rscc" | ||
| env = "analysis" | ||
| gpu_count = 1 | ||
| script = "scripts/eval/rscc_grid_search_script.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/rscc" | ||
|
|
||
| [[jobs]] | ||
| name = "lddt" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/lddt_evaluation_script.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/lddt" | ||
|
|
||
| [[jobs]] | ||
| name = "bond_geometry" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/bond_geometry_eval.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/bond_geometry" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| description = "Evaluation jobs that require external executables: tortoize and phenix.clashscore." | ||
|
|
||
| [defaults] | ||
| GRID_SEARCH_RESULTS_DIR = "/data/results/grid_search_results" | ||
| GRID_SEARCH_INPUTS_DIR = "/data/inputs" | ||
| PROTEIN_CONFIGS_CSV = "${GRID_SEARCH_INPUTS_DIR}/protein_analysis_config.csv" | ||
| TARGET_FILENAME = "refined-patched.cif" | ||
| N_JOBS = "16" | ||
| PATCH_CIF_PATTERN = "refined.cif" | ||
| GRID_SEARCH_DEPTH = "4" | ||
| PATCH_DEPTH = "${GRID_SEARCH_DEPTH}" | ||
| PATCH_INPUT_PDB_PATTERN = "processed/{pdb_id}/{pdb_id}_single_001_density_input.cif" | ||
| PATCH_RCSB_PATTERN = "${GRID_SEARCH_RESULTS_DIR}/([A-Za-z0-9]{4})" | ||
|
|
||
| [shared_args] | ||
| grid-search-results-path = "${GRID_SEARCH_RESULTS_DIR}" | ||
| grid-search-inputs-path = "${GRID_SEARCH_INPUTS_DIR}" | ||
| protein-configs-csv = "${PROTEIN_CONFIGS_CSV}" | ||
| target-filename = "${TARGET_FILENAME}" | ||
| occupancies = [0.0, 0.25, 0.5, 0.75, 1.0] | ||
| n-jobs = "${N_JOBS}" | ||
| depth = "${GRID_SEARCH_DEPTH}" | ||
|
|
||
| [[pre_jobs]] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mention this below in runner.py, but it may be better to run this script separately, or use cached results, since we might want to modify analysis scripts and rerun them, without needing to re-run patching, which is unfortunately a little slow (it can be trivially parallelized, but still takes some time)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. |
||
| name = "patch_outputs" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/patch_output_cif_files.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/patch_outputs" | ||
| args = { input-dir = "${GRID_SEARCH_RESULTS_DIR}", cif-pattern = "${PATCH_CIF_PATTERN}", rcsb-pattern = "${PATCH_RCSB_PATTERN}", depth = "${PATCH_DEPTH}", grid-search-input-dir = "${GRID_SEARCH_INPUTS_DIR}", input-pdb-pattern = "${PATCH_INPUT_PDB_PATTERN}" } | ||
|
|
||
| [[jobs]] | ||
| name = "tortoize" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/run_and_process_tortoize.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/tortoize" | ||
|
|
||
| [[jobs]] | ||
| name = "phenix_clashscore" | ||
| env = "analysis" | ||
| gpus = "none" | ||
| script = "scripts/eval/run_and_process_phenix_clashscore.py" | ||
| output_arg = "" | ||
| output_subdir = "analysis/phenix_clashscore" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to rebuilt the current images for us to be able to update code on the fly in actl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For code-only changes, no rebuild should be needed: ACTL syncs the checkout to
/home/dev/workspace, and the runner resolves scripts from that live checkout first. Rebuilds are still needed for dependency/pixi-lock changes or wrapper/image changes baked into/usr/local/bin;RUNTIME_PIXI=1is the escape hatch when intentionally testing a dependency change inside an existing pod.