Skip to content

Provide mechanism to discard data#695

Open
devreal wants to merge 9 commits into
ICLDisco:masterfrom
devreal:parsec-data-discard
Open

Provide mechanism to discard data#695
devreal wants to merge 9 commits into
ICLDisco:masterfrom
devreal:parsec-data-discard

Conversation

@devreal

@devreal devreal commented Nov 8, 2024

Copy link
Copy Markdown
Contributor

Add a function parsec_data_discard that releases the data such that the host copy remains intact but does not prevent destruction of the data once all device copies have been released. This keeps the host copy available for device copies to inspect and avoids potential race conditions in the release process. During an eviction, copies of data with a discarded host copy are not transfered but put directly into the lru.

Replaces some duplicated code with a call to parsec_device_release_gpu_copy.

@devreal devreal requested a review from a team as a code owner November 8, 2024 19:27
@devreal devreal force-pushed the parsec-data-discard branch 2 times, most recently from bbd1448 to 1dd2d54 Compare November 8, 2024 19:59
Comment thread parsec/mca/device/transfer_gpu.c
Comment thread parsec/data.c
Comment thread parsec/data.h Outdated
Comment thread parsec/data.c Outdated
Comment thread parsec/mca/device/device_gpu.c
item = (parsec_list_item_t*)item->list_next; /* conversion needed for volatile */
if( 0 == gpu_copy->readers ) {
if (cpu_copy->flags & PARSEC_DATA_FLAG_DISCARDED) {
parsec_list_item_ring_chop((parsec_list_item_t*)gpu_copy);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. The flag isn't changed and the device copy maintains a reference on the data_t so if we miss the update of the flags here we will evict into the host copy and then release everything.

@devreal devreal force-pushed the parsec-data-discard branch 2 times, most recently from dde7129 to 14934a5 Compare November 20, 2024 23:14
Add a function `parsec_data_discard` that releases the data
such that the host copy remains intact but does not prevent
destruction of the data once all device copies have been released.
This keeps the host copy available for device copies to inspect
and avoids potential race conditions in the release process.
During an eviction, copies of data with a discarded host copy
are not transfered but put directly into the lru.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Otherwise we cannot destroy empty or discarded data.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Also OR the flag instead of assigning it.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Discarded data may never be pushed back so don't warn about it
still being owned by the device.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Discarded data sit toward the end of the lru while the data
to be evicted is at the front. We walk both forward and backward
to collect the discarded data from the back, until we either meet the
pivot or we found enough data to evict. If we discarded data we don't
evict.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
therault
therault previously approved these changes Jan 30, 2025

@therault therault left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed during the call, issues raised are addressed and this is performance / feature - critical for MRA, so we should merge.

@abouteiller

Copy link
Copy Markdown
Contributor

will review that it doesn't break dplasma and merge

bosilca
bosilca previously approved these changes Jan 30, 2025
@devreal devreal force-pushed the parsec-data-discard branch from be66039 to 4f6b4c8 Compare February 14, 2025 21:23
@devreal

devreal commented Feb 14, 2025

Copy link
Copy Markdown
Contributor Author

I modified this PR so that parsec_data_discard notifies the device(s) about discarded data. If the device finds that it has discarded data, it will try to release that data. Otherwise we don't the pay the cost of iterating through the LRU. This simplified the w2r task creation back to something sane(r) I had earlier.

We only try to find discarded data if we know that there is discarded data.
If no one discarded data (e.g., DPLASMA) we don't go look for it.
This is also needed to properly clean up discarded data before releasing
the zone allocator.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
@devreal devreal force-pushed the parsec-data-discard branch from 4f6b4c8 to 9c7b42b Compare February 14, 2025 22:53
#endif
};

static inline void release_discarded_data(parsec_device_gpu_module_t *gpu_device, parsec_gpu_data_copy_t* gpu_copy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler says this function is not used.

@devreal

devreal commented Feb 25, 2025

Copy link
Copy Markdown
Contributor Author

This may not be needed anymore if we get parsec_data_release_self_contained_data from #671: https://github.com/ICLDisco/parsec/pull/671/files#diff-b76b62ea20f19d97740a2221cabf57210f9f97991c0ef635ca0cafcc4c3c40d1R596

@devreal devreal dismissed stale reviews from bosilca and therault via 411aff3 May 14, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants