Add cross_cov_matrix to transition data#13416
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13416 +/- ##
=======================================
Coverage 89.54% 89.54%
=======================================
Files 464 464
Lines 32776 32845 +69
=======================================
+ Hits 29349 29411 +62
- Misses 3427 3434 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
|
903391e to
71010a6
Compare
7fcb860 to
77e71a6
Compare
|
Locally the test passes. |
db79d7f to
291f99c
Compare
c050e3d to
41e1e07
Compare
This adds a new update event AnalysisMatrixEvent, which sends the correlation matrix in the callback as a part of the event. Is is saved to posterior / transition storage section together with serialized event. Add AnalysisStorageEvent and sparse flag Rename posterior_id to ensemble_id Add artifacts endpoint This returns all the AnalysisStorageEvents as a list Add update endpoint Save matrix after threshold being applied Replace transition with blob Rebase with main Fixup for corr matrix to bytes conv Make progress_callback a partial function to provide ensemble automatyically Fixups for posterior ensemble Fixup test
427c302 to
db39c86
Compare
| buf = io.BytesIO() | ||
| np.save(buf, corr_XY_matrix) |
There was a problem hiding this comment.
Do we have to do this manually, or can we use numpy.array.tobytes() directly?
There was a problem hiding this comment.
The difference is the tobytes just stores the data itself while np.save saves also the header, which might be a thing we want.
| sp.sparse.save_npz(blob_path, sparse_blob) | ||
| else: | ||
| blob_path = blob_dir / f"{stem}.npy" | ||
| np.save(blob_path, blob) |
There was a problem hiding this comment.
this should be bytes.
| userdata: Mapping[str, Any] = {} | ||
|
|
||
|
|
||
| class BlobOut(BaseModel): |
There was a problem hiding this comment.
Replace with the actual BlobStorageData | BlobStorageMatrix and init with validate_python
| uri: str | ||
| file_size: int | ||
| ensemble_id: str | ||
|
|
There was a problem hiding this comment.
update_algorithm: ...
|
|
||
|
|
||
| class MatrixStorageData(BlobStorageData): | ||
| sparse: bool = False |
| @@ -15,3 +16,8 @@ class BlobStorageData(BaseModel): | |||
| uri: str | |||
There was a problem hiding this comment.
{uuid}.blob <- bytes
| @@ -15,3 +16,8 @@ class BlobStorageData(BaseModel): | |||
| uri: str | |||
| file_size: int | |||
| ensemble_id: str | |||
There was a problem hiding this comment.
file_type: "parquet", "numpy"
| data_type = str(matrix.dtype) | ||
|
|
||
| sparsity = 1.0 - (np.count_nonzero(matrix) / matrix.size) | ||
| sparse = bool(sparsity > 0.5) |
There was a problem hiding this comment.
Is there a good reason for 0.5?
Issue
Resolves #13296
Relates to #13378
Approach
This introduces:
AnalysisMatrixEvent- for sending the matrix to update_run_model.AnalysisStorageEvent- for storing the eventUpdate: I might need to re-think this a bit due to fact when loading the data back how the endpoint should look like.
(Screenshot of new behavior in GUI if applicable)
git rebase -i main --exec 'just rapid-tests')When applicable