Fixes and Tweaks for Slurm and the Web Component by dmf24 · Pull Request #2 · jpatsenker/cra

dmf24 · 2019-12-03T02:27:13Z

These are changes made to achieve the following:

Make the code sufficiently agnostic to be able to run on either https://corecop.hms.harvard.edu https://kirschner.med.harvard.edu.
Enable slurm job submission and ensure it works properly for the whole pipeline.
Ensure that each step in the pipeline has access to the proper cluster and system resources (libraries, etc.) to be able to run successfully.

Here's a summary of most of the changes in more detail:

Added an sbatch method to the slurm job runner class, and used sbatch instead of srun for the default job command.
Changed the wait logic. All of the python subprocess calls will block and wait for the results of the command. For the initial control job submission, sbatch will NOT be configured to wait. The python subprocess call blocks, but only until it gets the Submitted batch job ...... result. For each subsequent step in the pipeline, sbatch will be configured to wait (--wait) and so the subprocess call will block until the job finishes.
Removed kjbuckets.so from version control, which was compiled for python 2.6, and configured git to ignore the .so file. I compiled a fresh 2.7 version, and added it to the corecop virtualenv, making kjbuckets available no matter what directory you import from. This virtualenv is configured to be activated automatically by the webserver and inherited by the php scripts that initially invoke python. (Of course, this virtualenv could also be activated by code).
Parameterized file paths in file_paths.py, to support allowing the site to run from a docroot other than /www/kirschner.med.harvard.edu/docroot. Also removed the hard-coded paths to python and perl and configured them to run from the environment, allowing use of the perl/5.24.0 module and a custom python virtualenv rather than system binaries.
Purged all *.pyc files from git and configured git to ignore them. These files are compiled bytecode from imported modules and should not be kept in source control. They will be created dynamically as needed.
Replaced a number of instances of path assembly using string addition, with python's more robust os.path.join function. I don't think I caught them all. The goal is to support setting directories as either /path/to/directory or /path/to/directory/. Currently, the scratch directory must be specified with a trailing slash, or else the application will try to write files to /path/to/directoryfilename.log.
Added a line in results.php: date_default_timezone_set("America/New_York"); to clear some error spam on the results page.
Minor tweaks of the run_cra_interface file used for debugging.
Changed the scratch location to /n/groups/kirschner_www/corecop instead of /n/scratch2/cra. This is only because scratch2 is not currently mounted on the web servers. I also made this configurable with an environment variable if or when we are able to make scratch2 available on the webservers.
Created a symlink from <corecop>/analyzers/fasta_checker_for_crap.pl to /www/kirschner.med.harvard.edu/docroot/genomes/code/fasta_checker_for_crap.pl. This is probably not necessary, the FASTACHECKER_PATH can be changed back to hardcoded. This was a change I made while troubleshooting and did not revert as it is stillw orking.

These are outstanding issues unrelated to any issues with slurm submission or web hosting:

There's no robust job tracking and coordination from the web server side. It appears that at some point in the past, INCREMENTFILE.num was used to create a unique working directory for each job, but this feature seems to have been removed. Currently each webserver job simply creates files in the scratch directory prefixed with the name of the uploaded file, overwriting any files that might happen to be there already. This means that if two different people submit a file with the same name, the second job will overwrite the results of the first job (or lead to race conditions if the jobs run simultaneously).
A number of output and error files are overwritten by subsequent steps in the pipeline. In fact, during troubleshooting I had to create numerous separate error log files for each step. Consolidating some of this will probably be convenient.
Some of the email formatting seems to be broken.
Scratch directory still must be specified with a trailing slash (/). Minor issue but I think it's close to fixed.

… run and see what happens

…aux.jobs instead

… proctools.py with process wrappers

…ry directory can be specified with standard unix semantics

…cratch directory for now until I can clean up the rest of the string-concatenated paths

…e defaults to use corecop.hms.harvard.edu

O2 and o2web compatibility changes

Write SMTP exception to the log

…rd.edu

Replace sender noreply@kirschner.med.harvard.edu with pesha@hms.harva…

dmf24 added 30 commits November 25, 2019 15:32

remove hardcoded chdir

0592193

Re-enable disabled job submission code

3768f81

Ignore pyc files and emacs temp files

25c346d

Enable job run

aad256b

Switch srun to sbatch-wrap and disable waiting just to get the job to…

bea6c36

… run and see what happens

Add sbatch to aux/jobs and use it for fasta_checker

f8fa54c

Git Remove pyc files

5650716

Removing more pyc files from git

f9e8bd5

Remove job submission code from run_cra_interface.py and import from …

afdbb37

…aux.jobs instead

Remove more dead/commented code, refactor sbatch process calling, add…

a84e218

… proctools.py with process wrappers

ignore output files

c149580

fix typo

098dba9

import sys for output

a7fe079

Scratch env var

9c52e91

Change scratch location

235d5cc

Add unique error files for each run

9eaff61

and ignore them

987a7bb

Add more logging

57d0dfd

Actually perform the appropriate boolean logic for adding flags

7e9f286

Use os.path.join instead of string concatenation so alternate tempora…

3827def

…ry directory can be specified with standard unix semantics

Remove modified error and output filenames in sbatch

43bbec8

typo

0a5b2cb

string fixes

67a21ba

More string cleanup

0966b37

stringcleanup

c10efa7

Load modules.sh in run_cd-hit.sh and just set the trailing / on the s…

647cb9e

…cratch directory for now until I can clean up the rest of the string-concatenated paths

path join fix

dd6ec8a

Allow pathnames to be controlled with environment variables and chang…

5449960

…e defaults to use corecop.hms.harvard.edu

fix typo

e5c15a0

symlink to kirschner.med for now for fasta_checker

c0e564e

dmf24 and others added 17 commits November 29, 2019 17:31

timezone line

1d76400

Remove kjbuckets.so from github (installed into virtualenv)

6d79044

it's date_default_timezone_set, not date_timezone_set

c2cb413

Clean up CRA_SCRATCH setting, and set default to use /n/groups for now

9cbabf0

Put interface logs in the log directory

76aa347

Merge pull request #1 from dmf24/dmf24o2webfix

430d81b

O2 and o2web compatibility changes

Suppress php errors for production

9f5b071

Allow SMTP server to be configured, and fix email formatting

420756d

Set corecop_base directory

ec2a5b4

Set corecop_base directory

b29cb67

More email format fixing

370fe00

endswithfunc

b194f88

Finish email tweak

ea1e67a

Write SMTP exception to the log

339db8b

Merge pull request #2 from dmf24/dmf24-more-email-fix

65e3583

Write SMTP exception to the log

Replace sender noreply@kirschner.med.harvard.edu with pesha@hms.harva…

eff97b6

…rd.edu

Merge pull request #3 from dmf24/dmf24_update_sender

bd7e2be

Replace sender noreply@kirschner.med.harvard.edu with pesha@hms.harva…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and Tweaks for Slurm and the Web Component#2

Fixes and Tweaks for Slurm and the Web Component#2
dmf24 wants to merge 47 commits into
jpatsenker:masterfrom
dmf24:master

dmf24 commented Dec 3, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dmf24 commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dmf24 commented Dec 3, 2019 •

edited

Loading