Skip to content

Fixes and Tweaks for Slurm and the Web Component#2

Open
dmf24 wants to merge 47 commits into
jpatsenker:masterfrom
dmf24:master
Open

Fixes and Tweaks for Slurm and the Web Component#2
dmf24 wants to merge 47 commits into
jpatsenker:masterfrom
dmf24:master

Conversation

@dmf24

@dmf24 dmf24 commented Dec 3, 2019

Copy link
Copy Markdown

These are changes made to achieve the following:

  1. Make the code sufficiently agnostic to be able to run on either https://corecop.hms.harvard.edu https://kirschner.med.harvard.edu.
  2. Enable slurm job submission and ensure it works properly for the whole pipeline.
  3. Ensure that each step in the pipeline has access to the proper cluster and system resources (libraries, etc.) to be able to run successfully.

Here's a summary of most of the changes in more detail:

  1. Added an sbatch method to the slurm job runner class, and used sbatch instead of srun for the default job command.
  2. Changed the wait logic. All of the python subprocess calls will block and wait for the results of the command. For the initial control job submission, sbatch will NOT be configured to wait. The python subprocess call blocks, but only until it gets the Submitted batch job ...... result. For each subsequent step in the pipeline, sbatch will be configured to wait (--wait) and so the subprocess call will block until the job finishes.
  3. Removed kjbuckets.so from version control, which was compiled for python 2.6, and configured git to ignore the .so file. I compiled a fresh 2.7 version, and added it to the corecop virtualenv, making kjbuckets available no matter what directory you import from. This virtualenv is configured to be activated automatically by the webserver and inherited by the php scripts that initially invoke python. (Of course, this virtualenv could also be activated by code).
  4. Parameterized file paths in file_paths.py, to support allowing the site to run from a docroot other than /www/kirschner.med.harvard.edu/docroot. Also removed the hard-coded paths to python and perl and configured them to run from the environment, allowing use of the perl/5.24.0 module and a custom python virtualenv rather than system binaries.
  5. Purged all *.pyc files from git and configured git to ignore them. These files are compiled bytecode from imported modules and should not be kept in source control. They will be created dynamically as needed.
  6. Replaced a number of instances of path assembly using string addition, with python's more robust os.path.join function. I don't think I caught them all. The goal is to support setting directories as either /path/to/directory or /path/to/directory/. Currently, the scratch directory must be specified with a trailing slash, or else the application will try to write files to /path/to/directoryfilename.log.
  7. Added a line in results.php: date_default_timezone_set("America/New_York"); to clear some error spam on the results page.
  8. Minor tweaks of the run_cra_interface file used for debugging.
  9. Changed the scratch location to /n/groups/kirschner_www/corecop instead of /n/scratch2/cra. This is only because scratch2 is not currently mounted on the web servers. I also made this configurable with an environment variable if or when we are able to make scratch2 available on the webservers.
  10. Created a symlink from <corecop>/analyzers/fasta_checker_for_crap.pl to /www/kirschner.med.harvard.edu/docroot/genomes/code/fasta_checker_for_crap.pl. This is probably not necessary, the FASTACHECKER_PATH can be changed back to hardcoded. This was a change I made while troubleshooting and did not revert as it is stillw orking.

These are outstanding issues unrelated to any issues with slurm submission or web hosting:

  1. There's no robust job tracking and coordination from the web server side. It appears that at some point in the past, INCREMENTFILE.num was used to create a unique working directory for each job, but this feature seems to have been removed. Currently each webserver job simply creates files in the scratch directory prefixed with the name of the uploaded file, overwriting any files that might happen to be there already. This means that if two different people submit a file with the same name, the second job will overwrite the results of the first job (or lead to race conditions if the jobs run simultaneously).
  2. A number of output and error files are overwritten by subsequent steps in the pipeline. In fact, during troubleshooting I had to create numerous separate error log files for each step. Consolidating some of this will probably be convenient.
  3. Some of the email formatting seems to be broken.
  4. Scratch directory still must be specified with a trailing slash (/). Minor issue but I think it's close to fixed.

dmf24 added 30 commits November 25, 2019 15:32
…ry directory can be specified with standard unix semantics
…cratch directory for now until I can clean up the rest of the string-concatenated paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant