One-off start coordinates confusion

Hi Cameron,

I've been playing around with cblaster sessions lately to make [`cfoldseeker`](https://github.com/LucoDevro/cfoldseeker) use them well as cblaster-compatible data containers. However, I've encountered a potential inconsistency in how cblaster treats the start coordinates of hits.

While developing cfoldseeker, I was not aware of any need for zero indexation. Everything worked fine when cross-reffing the coordinates I got from KEGG or UniProt to NCBI Nucleotide, and vice versa.  So, I was quite surprised to find out that cblaster does an odd zero indexation for the start coordinate - and not for the end coordinates! -. For example, in [this remote-mode cblaster session](https://github.com/user-attachments/files/28150923/filtered_session.json), entry WP_046206895.1 confusingly starts at base 15543, while at [NCBI](https://www.ncbi.nlm.nih.gov/nuccore/NZ_JAESML010000031.1), it start at 15544 (which is also the A of the start codon). cblaster's start coordinates are also one off in the summary file.

This one-off is hobbling extracting gene cluster genbanks for a [cblaster session holding remote-mode cfoldseeker information](https://github.com/user-attachments/files/28150909/filtered_session.json) with *unconverted coordinates*. The resulting Genbanks lack all CDS features because l. 390 in extract_clusters.py makes every CDS disappear as TranslationErrors at l. 396 because of the extra nucleotide.

https://github.com/gamcil/cblaster/blob/5b330bc826ab3f699387111302c6833ac8e42b63/cblaster/extract_clusters.py#L390:L398

In the more intuitive situation without zero-indexation, l. 390 would look like
 `if (len(cds_feature.qualifiers.get("translation", "")) + 1) * 3 > subject.end - subject.start + 1:`

I'm a bit hesitant to comply with this zero indexation, because I don't get the need for it. What is the reasoning behind it? Users may also not be aware of this counterintuitive indexation, so why do you let it propagate into the outputs?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One-off start coordinates confusion #130

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

One-off start coordinates confusion #130

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions