Skip to content

Allow for flexible columns in manifest for vector alignment genotyping pipeline#71

Open
tnngh wants to merge 1 commit into
masterfrom
70_flexible_manifest_col
Open

Allow for flexible columns in manifest for vector alignment genotyping pipeline#71
tnngh wants to merge 1 commit into
masterfrom
70_flexible_manifest_col

Conversation

@tnngh

@tnngh tnngh commented Dec 9, 2020

Copy link
Copy Markdown
Contributor

Addresses #70

NB: this will fail miniwdl validation. But it passes womtool validation and runs on cromwell server v49. I suspect it's because miniwdl doesn't support read_objects() WDL function from WDL v1.0, but not 100% sure.

Array[Object] lanelet_infos = read_objects(per_sample_manifest_file)

The preferred approach is to get miniwdl to recognize the WDL as valid, whether that means to modify the WDL or miniwdl settings. Another approach is to switch github actions to use womtool instead of miniwdl for validation. However, womtool syntax validation is not nearly as thorough as miniwdl.

… as they contain at least sample_id, irods_path
@tnngh

tnngh commented Dec 10, 2020

Copy link
Copy Markdown
Contributor Author

Surfacing communications with @gbggrant. TL;DR: Try using a Struct instead of Object.

From: George Grant ggrant@broadinstitute.org
Date: Thursday, 10 December 2020 at 13:39
To: Thuy Nguyen tn6@sanger.ac.uk
Subject: Re: Advice on miniwdl [EXT]

Hi Thuy,

Here's what I got back from Mike Lin (dna@mlin.net):

Hi George, yea we've so far punted on this since the miniwdl project started shortly after the OpenWDL group moved towards deprecating Object. The object{} literal syntax is there only for building structs. It could be finished if it really unlocked some high-value use case, but otherwise we were hoping to get away with this approach. Is there anything insufficient with the newer structs that prevent using them here? (Of course I understand it's not always possible to invest time in fixing what ain't broke)

Not 100% sure I understand what he's suggesting - but it sounds like you should be able to replace Object with a Struct. I'll forward on the email chain to you if you want to follow up directly with him.

George

@tnngh

tnngh commented Dec 14, 2020

Copy link
Copy Markdown
Contributor Author

I tried replacing Array[Object] with Array[Map[String, String]] and got the following error:

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 36 Col 46) No such function: read_objects
      Array[Map[String, String]] lanelet_infos = read_objects(per_sample_manifest_file)

I also tried using Array[struct]. I have to admit I have little experience with wdl structs, so I’m not sure if I used the correct syntax. However, I still get the same read_objects error.

$  miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 37 Col 35) No such function: read_objects
      Array[Manifest] lanelet_infos = read_objects(per_sample_manifest_file)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ImportShortReadLaneletAlignment.wdl Ln 40 Col 25) Not an array
        String irods_path = lanelet_infos[idx][LANELET_INFO_COLNAME_IRODS_PATH]
                            ^^^^^^^^^^^^^^^^^^

Where Manifest struct is defined in Manifest.wdl:

version 1.0
 
struct Manifest {
  String sample_id
  String irods_path
}

And imported by ImportShortReadLaneletAlignment.wdl:

import "Manifest.wdl"

@gbggrant gbggrant left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Not sure what to do about miniwdl validation error.


command <<<

python <<CODE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be inset (both for the the CODE block, and then for the embedded python)

@tnngh

tnngh commented Jan 4, 2021

Copy link
Copy Markdown
Contributor Author

More communications with Mike Lin, developer of Miniwdl. Tl;dr: try miniwdl v0.10.0:
https://github.com/chanzuckerberg/miniwdl/releases/tag/v0.10.0

From: Mike Lin dna@mlin.net
Date: Monday, 4 January 2021 at 06:45
To: Thuy Nguyen tn6@sanger.ac.uk
Cc: George Grant ggrant@broadinstitute.org
Subject: Re: Validation error from miniwdl [EXT]

Dear Thuy, the new miniwdl v0.10.0 [github.com] implements read_object() and read_objects() but only to load Map[String,String] (which can in turn be coerced to a struct type) -- it still lacks general Object support. I hope this helps you, any feedback welcome! Mike

On Tue, Dec 15, 2020 at 8:34 PM Thuy Nguyen tn6@sanger.ac.uk wrote:
Hi

We quite like miniwdl for wdl validation. It’s easy to use and is very good at catching syntax issues. Adding read_objects() support is most preferable for us. What is your rough estimate for when such support could be available? I realize we have been discussing all of this quite informally. I’m happy to create an official feature request in the github if it’s easier.

Many thanks!
Thuy

From: Mike Lin dna@mlin.net
Date: Monday, 14 December 2020 at 23:11
To: Thuy Nguyen tn6@sanger.ac.uk
Cc: George Grant ggrant@broadinstitute.org
Subject: Re: Validation error from miniwdl [EXT]

Dear Thuy, sorry for these problems -- miniwdl's support for Object is incomplete for the reasons I mentioned with George below, and read_object() and read_objects() in particular are absent. We can work with you to converge on a solution though.
• If you're open to moving towards structs, read_json() should currently work (but would require the manifest to be formatted as JSON instead of TSV)
• We could add read_object() and read_objects() only for the purpose of initializing structs from TSV
• Fully completing Object support is the least-preferred option for us for the reasons mentioned below but it's possible if it really unblocks something critical
Let me know what you think, any other feedback welcome too!

Hi George, yea we've so far punted on this since the miniwdl project started shortly after the OpenWDL group moved towards deprecating Object. The object{} literal syntax is there only for building structs. It could be finished if it really unlocked some high-value use case, but otherwise we were hoping to get away with this approach. Is there anything insufficient with the newer structs that prevent using them here? (Of course I understand it's not always possible to invest time in fixing what ain't broke)

On Mon, Dec 14, 2020 at 11:50 AM Thuy Nguyen tn6@sanger.ac.uk wrote:
Hi Mike

I’m the developer that George was referring to earlier. I hope you and George don’t mind if I contact you directly. I’m wondering if read_objects() from WDL v1.0 is supported by miniwdl?

Our use case is to read in a TSV with a header such that the columns can be in any order. I wasn’t sure of any other way in WDL v1.0 to do this other than Array[Objects] arr_of_row_objects = read_objects(tsv_with_header). However, I’m definitely up for alternatives.

I tried replacing Array[Object] with Array[Map[String, String]] and got the following error:

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 36 Col 46) No such function: read_objects
Array[Map[String, String]] lanelet_infos = read_objects(per_sample_manifest_file)

I also tried using Array[struct]. I have to admit I have little experience with wdl structs, so I’m not sure if I used the correct syntax. However, I still get the same read_objects error.

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 37 Col 35) No such function: read_objects
Array[Manifest] lanelet_infos = read_objects(per_sample_manifest_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ImportShortReadLaneletAlignment.wdl Ln 40 Col 25) Not an array
String irods_path = lanelet_infos[idx][LANELET_INFO_COLNAME_IRODS_PATH]
^^^^^^^^^^^^^^^^^^

Where Manifest struct is defined in Manifest.wdl:

version 1.0

struct Manifest {
String sample_id
String irods_path
}

And imported by ImportShortReadLaneletAlignment.wdl:

import "Manifest.wdl"

Any suggestions on how to get past the “No such function: read_objects” error?

Many thanks!
Thuy Nguyen
Senior Bioinformatician
Malaria Programme
Wellcome Sanger Institute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants