Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4148a9a
Fix H5Fflush failure in hdf5_v1 printer with MultiNest
anderkve May 4, 2026
343026c
Added a CLAUDE.md and MINIMAL_BUILD.md
anderkve May 4, 2026
4d6eebc
Added file with list of hdf5 printer issues
anderkve May 4, 2026
cef688a
[hdf5_v2] Guard finalise() RA-flush block with myRank==0 to fix unini…
claude May 4, 2026
896c3b2
Document optional MPI install in MINIMAL_BUILD.md
claude May 4, 2026
dd0109b
Ignore downloaded_tarball_paths.txt build artifact
claude May 4, 2026
7bd5c1b
[hdf5_v1] Fix errorsOff/errorsOn nesting that permanently silenced HD…
claude May 4, 2026
592ad6d
Added a CLAUDE.md and MINIMAL_BUILD.md
anderkve May 4, 2026
8d905d6
Fix H5Fflush failure in hdf5_v1 printer with MultiNest
anderkve May 4, 2026
d292d24
[hdf5_v1] Default-initialise hid_t members to -1 to harden aux printe…
claude May 4, 2026
ff1955a
[hdf5_v2] Close HDF5 type handle in HDF5DataSet::write_buffer (handle…
claude May 4, 2026
d25db2a
[hdf5_v1] Clarify why buffer / dataset destructors are intentionally …
claude May 4, 2026
9aa6166
[hdf5_v1] Replace bare exit(1) in printer ctor with printer_error().r…
claude May 4, 2026
bde00ba
[hdf5_v2] Delete unused MPI_flush_to_rank / MPI_recv_all_buffers / MP…
claude May 4, 2026
91de1b8
[hdf5_v1] Close H5Dget_type ids leaked in combine_tools
claude May 4, 2026
533d150
[hdf5_v1] Avoid undefined iterator decrement past begin() in combine …
claude May 4, 2026
7323306
[hdf5_v2] HDF5Printer2::flush() now also flushes registered aux_buffers
claude May 4, 2026
a4847e6
[hdf5_v1] Guard sync_pos==0 underflow in synchronise_buffers and get_…
claude May 5, 2026
6952d63
Deleted temp files used for working on the hdf5 printers
anderkve May 5, 2026
0721210
Clarified a code snippet and comment in hdf5printer.hpp
anderkve May 5, 2026
f131987
Fixed typos in hdf5printer.hpp
anderkve May 5, 2026
0ba474e
Tweaked comment in hdf5printer_v2.hpp
anderkve May 5, 2026
d1edc9c
Tweaks to comment in hdf5tools.cpp
anderkve May 5, 2026
02d482a
Revised a bunch of overly verbose code comments.
anderkve May 7, 2026
7649d04
Merge branch 'master' into hdf5_work
anderkve May 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ Backends/installed/**
ScannerBit/downloaded/**
ScannerBit/installed/**
ScannerBit/lib/plugin_libraries.list
downloaded_tarball_paths.txt

# GUM files
*.mug
Expand Down
30 changes: 18 additions & 12 deletions Printers/include/gambit/Printers/printers/hdf5printer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -332,17 +332,20 @@ namespace Gambit
std::string group; // HDF5 group location to store datasets
std::string metadata_group; // HDF5 group location to store metadata

// Handles for HDF5 files and groups containing the datasets
hid_t file_id;
hid_t group_id;
hid_t RA_group_id;
hid_t metadata_id;
// Handles for HDF5 files and groups containing the datasets.
// Only the primary printer's constructor assigns the file/group
// handles (file_id, group_id, RA_group_id, metadata_id). Aux instances
// inherit the *_location_id values from the primary.
hid_t file_id = -1;
hid_t group_id = -1;
hid_t RA_group_id = -1;
hid_t metadata_id = -1;

// Handle to a location in a HDF5 to which the datasets will be written
// i.e. a file or a group.
hid_t location_id;
hid_t RA_location_id;
hid_t metadata_location_id;
hid_t location_id = -1;
hid_t RA_location_id = -1;
hid_t metadata_location_id = -1;

/// Pointer to the primary printer object
// (if this is an auxilliary printer, else it is "this" //NULL)
Expand Down Expand Up @@ -543,10 +546,13 @@ namespace Gambit

// Force increment the buffer to "catch it up" to the current sync
// position, in case it has been created "late".
// We subtract one because another increment will happen after
// the print statement (that triggered the creation of the new
// buffer) completes.
if(synchronised) it->second.fast_forward(printer->get_sync_pos()-1);
if (printer->get_sync_pos() > 0) // if 0, nothing to catch up to
{
// We subtract one because another increment will happen after
// the print statement (that triggered the creation of the new
// buffer) completes.
if(synchronised) it->second.fast_forward(printer->get_sync_pos()-1);
}
}

if( it == local_buffers.end() )
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,23 +184,20 @@ namespace Gambit {
for(std::size_t i=0; i<DSETRANK; i++) { record_dims[i] = rdims[i]; }
}

/// Do cleanup (close dataset)
/// Destructor.
/// Deliberately does NOT close the underlying HDF5 dataset.
///
/// DataSetInterfaceBase (and its derived classes) are stored by value
/// and assigned-from-temporary in several places. In such cases a
/// temporary is constructed, then copy-/move-assigned into a member or
/// container slot, then destroyed. Closing here in the destructor would
/// invalidate the handle held by the surviving copy. The actual close
/// therefore lives in the explicit closeDataSet() method below, used by
/// VertexBufferNumeric1D_HDF5::finalise() during the printer's
/// finalise() pass.
template<class T, std::size_t RR, std::size_t CL>
DataSetInterfaceBase<T,RR,CL>::~DataSetInterfaceBase()
{
// TODO: Having problems with copied objects sharing dataset identifiers, and closing datasets prematurely on each other.
// To fix, will probably need to have a fancy copy constructor or something. Or wrap datasets in an
// object which itself has a fancy copy constructor. For now, just leave dataset resources lying around,
// probably won't cause any issues.
// Or could explicity tell interface to close datasets before the objects are destroyed.
//if(this->dset_id>=0)
//{
// herr_t status = H5Dclose(this->dset_id);
// if(status<0)
// {
// logger() << LogTags::printers << LogTags::err <<LogTags::repeat_to_cerr<< LOCAL_INFO << ": Error destructing DataSetInterfaceBase! Failed to close wrapped dataset! (H5Dclose failed). No exception thrown because this will behave badly when throw from a destructor. (dataset name: "<<myname<<")"<<EOM;
// }
//}
}

/// Release resources associated with the underlying dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -290,16 +290,24 @@ namespace Gambit {
}
}

/// Destructor
/// Destructor.
/// Deliberately does NOT flush the buffer to disk.
///
/// Buffer objects of this type are stored by value (in the
/// std::map<VBIDpair, BuffType> inside H5P_LocalBufferManager) and
/// are populated via the assign-from-temporary pattern at
/// H5P_LocalBufferManager::get_buffer, so each buffer object's
/// destructor runs at least once on a temporary that shares state
/// (HDF5 dataset handles, sync counters) with the surviving
/// container slot. Triggering write_to_disk() / RA_write_to_disk()
/// here would therefore either double-write or write through stale
/// state, depending on which copy is being destroyed. End-of-scan
/// flushing is handled deterministically by HDF5Printer::flush()
/// and HDF5Printer::finalise() instead, which iterate over the
/// surviving buffers in the printer's all_buffers map.
template<class T, std::size_t L>
VertexBufferNumeric1D_HDF5<T,L>::~VertexBufferNumeric1D_HDF5()
VertexBufferNumeric1D_HDF5<T,L>::~VertexBufferNumeric1D_HDF5()
{
//TODO: Do this in some more controlled way
//if(this->is_synchronised()) { write_to_disk(); }
//else
//{
// RA_write_to_disk();
//}
}

// Print out report on buffer sync status
Expand Down
161 changes: 4 additions & 157 deletions Printers/include/gambit/Printers/printers/hdf5printer_v2.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,6 @@ namespace Gambit
typedef long long longlong;
typedef unsigned long long ulonglong;

// DEBUG h5v2_BLOCK message counters
//static int recv_counter;
//static int send_counter;

/// Length of chunks in chunked HDF5 dataset. Affects write/retrieval performance for blocks of data of various sizes.
/// It is set to an "intermediate" sort of size since that seems to work well enough.
static const std::size_t HDF5_CHUNKLENGTH = 100;
Expand All @@ -82,25 +78,9 @@ namespace Gambit
/// Largest allowed size of buffers. Size can be dynamically set from 1 to this number.
static const std::size_t MAX_BUFFER_SIZE = 100000;

/// MPI tags for HDF5 printer v2
const int h5v2_bufname(10);
const int h5v2_bufdata_points(11);
const int h5v2_bufdata_ranks(12);
const int h5v2_bufdata_valid(13);
const int h5v2_bufdata_type(14);
const int h5v2_bufdata_values(15);
// Message block end tag:
// 1 means "there is another block of messages to receive"
// 0 means "there are no more blocks of messages to receive"
const int h5v2_BLOCK(30);
// "Begin sending data" tag
const int h5v2_BEGIN(31);

// The 'h5v2_bufdata_type' messages send an integer encoding
// the datatype for the h5v2_bufdata_values messages
// The integer-encoded type ID for buffer-data messages.
// Need a unique integer for each type. We can encode these
// with a template function:

template<class T>
std::set<T> set_diff(const std::set<T>& set1, const std::set<T>& set2)
{
Expand Down Expand Up @@ -308,6 +288,9 @@ namespace Gambit
errmsg << "Error! Tried to write to dataset (name="<<myname()<<") with type id "<<dtype<<" but expected it to have type id "<<expected_dtype<<". This is a bug, please report it.";
printer_error().raise(LOCAL_INFO, errmsg.str());
}
// dtype is no longer needed, close it here to avoid leaking one
// HDF5 type handle per flush per dataset.
H5Tclose(dtype);

std::size_t required_size = target_pos+length;
// Check that target position is allowed
Expand Down Expand Up @@ -638,12 +621,6 @@ namespace Gambit
// int version
virtual std::pair<std::vector<long>,std::vector<int>> flush_to_vector_int(const std::vector<PPIDpair>& order) = 0;

#ifdef WITH_MPI
/// Send buffer contents to another process
virtual void MPI_flush_to_rank(const unsigned int r) = 0;

#endif

/// Make sure datasets exist on disk with the correct name and size
virtual void ensure_dataset_exists(const hid_t loc_id, const std::size_t length) = 0;

Expand Down Expand Up @@ -905,102 +882,6 @@ namespace Gambit
return buffer.size();
}

#ifdef WITH_MPI
// Send buffer contents to a different process
void MPI_flush_to_rank(const unsigned int r)
{
if(buffer.size()>0)
{
// Get name of the dataset this buffer is associated with
std::string namebuf = dset_name();
// Copy point data and values into MPI send buffer
std::vector<unsigned long> pointIDs;
std::vector<unsigned int> ranks; // Will assume all PPIDpairs are valid. I think this is fine to do...
std::vector<int> valid; // We have to send the invalid points too, to maintain buffer synchronicity
std::vector<T> values;
int type(h5v2_type<T>()); // Get integer identifying the type of the data values
int more_buffers = 1; // Flag indicating that there is a block of data to receive
for(auto it=buffer.begin(); it!=buffer.end(); ++it)
{
pointIDs.push_back(it->first.pointID);
ranks .push_back(it->first.rank);
valid .push_back(buffer_valid.at(it->first));
values .push_back(it->second);
}

// Debug info
#ifdef HDF5PRINTER2_DEBUG
logger()<<LogTags::printers<<LogTags::debug<<"Sending points for buffer "<<dset_name()<<std::endl
<<" (more_buffers: "<<more_buffers<<")"<<std::endl;
for(std::size_t i=0; i<buffer.size(); ++i)
{
logger()<<" Sending point ("<<ranks.at(i)<<", "<<pointIDs.at(i)<<")="<<values.at(i)<<" (valid="<<valid.at(i)<<")"<<std::endl;
}
logger()<<EOM;
#endif

// Send the buffers
std::size_t Npoints = values.size();
myComm.Send(&more_buffers, 1, r, h5v2_BLOCK);
//std::cerr<<myComm.Get_rank()<<": sent "<<more_buffers<<std::endl;
//send_counter+=1;
myComm.Send(&namebuf[0] , namebuf.size(), MPI_CHAR, r, h5v2_bufname);
myComm.Send(&type , 1 , r, h5v2_bufdata_type);
myComm.Send(&values[0] , Npoints, r, h5v2_bufdata_values);
myComm.Send(&pointIDs[0], Npoints, r, h5v2_bufdata_points);
myComm.Send(&ranks[0] , Npoints, r, h5v2_bufdata_ranks);
myComm.Send(&valid[0] , Npoints, r, h5v2_bufdata_valid);

// Clear buffer variables
buffer .clear();
buffer_valid.clear();
buffer_set .clear();
}
}

// Receive buffer contents from a different process
// (MasterBuffer should have received the name, type, and size of the
// buffer data, and used this to construct/retrieve this buffer.
// We then collect the buffer data messages)
void MPI_recv_from_rank(unsigned int r, std::size_t Npoints)
{
/// MPI buffers
std::vector<unsigned long> pointIDs(Npoints);
std::vector<unsigned int> ranks(Npoints);
std::vector<int> valid(Npoints);
std::vector<T> values(Npoints);

// Receive buffer data
myComm.Recv(&values[0] , Npoints, r, h5v2_bufdata_values);
myComm.Recv(&pointIDs[0], Npoints, r, h5v2_bufdata_points);
myComm.Recv(&ranks[0] , Npoints, r, h5v2_bufdata_ranks);
myComm.Recv(&valid[0] , Npoints, r, h5v2_bufdata_valid);

// Pack it into this buffer
#ifdef HDF5PRINTER2_DEBUG
logger()<<LogTags::printers<<LogTags::debug<<"Adding points to buffer "<<dset_name()<<std::endl;
for(std::size_t i=0; i<Npoints; ++i)
{
// Extra Debug
logger()<<" Adding received point ("<<ranks.at(i)<<", "<<pointIDs.at(i)<<")="<<values.at(i)<<" (valid="<<valid.at(i)<<")"<<std::endl;
PPIDpair ppid(pointIDs.at(i), ranks.at(i));
if(valid.at(i))
{
append(values.at(i), ppid);
}
else
{
update(ppid);
}
}
logger()<<EOM;
#endif

// Debug info:
//std::cout<<"(rank "<<myComm.Get_rank()<<") Final buffer size: "<<N_items_in_buffer()<<" (Npoints was: "<<Npoints<<"), dset="<<dset_name()<<std::endl;
}
#endif

void add_float_block(const HDF5bufferchunk& chunk, const std::size_t buf)
{
// Pack it into this buffer
Expand Down Expand Up @@ -1274,40 +1155,6 @@ namespace Gambit
void print_metadata(std::map<std::string,std::string>, bool);

#ifdef WITH_MPI
/// Gather all buffer data on a certain rank process
/// (only gathers data from buffers known to that process)
void MPI_flush_to_rank(const unsigned int rank);

/// Give process 'rank' permission to begin sending its buffer data
void MPI_request_buffer_data(const unsigned int rank);

/// Receive buffer data from a specified process until a STOP message is received
void MPI_recv_all_buffers(const unsigned int rank);

/// Receive buffer data of a known type for a known dataset
/// Requires status message resulting from a probe for the message to be received
template<class T>
int MPI_recv_buffer(const unsigned int r, const std::string& dset_name)
{
// Get number of points to be received
MPI_Status status;
myComm.Probe(r, h5v2_bufdata_values, &status);
int Npoints;
int err = MPI_Get_count(&status, GMPI::get_mpi_data_type<T>::type(), &Npoints);
if(err<0)
{
std::stringstream msg;
msg<<"Error from MPI_Get_count while attempting to receive buffer data from rank "<<r<<" for dataset "<<dset_name<<"!";
printer_error().raise(LOCAL_INFO,msg.str());
}
HDF5Buffer<T>& buffer = get_buffer<T>(dset_name, buffered_points);
//std::cout<<"(rank "<<myComm.Get_rank()<<") Npoints: "<<Npoints<<std::endl;
buffer.MPI_recv_from_rank(r, Npoints);
logger()<< LogTags::printers << LogTags::debug << "Received "<<Npoints<<" points from rank "<<r<<"'s buffers (for dataset: "<<dset_name<<")"<<EOM;
//std::cout<<"(rank "<<myComm.Get_rank()<<") Received "<<Npoints<<" from rank "<<r<<". New buffer size is "<<buffer.N_items_in_buffer()<<" (name="<<buffer.dset_name()<<")"<<std::endl;
return Npoints;
}

/// Copy an MPI-transmitted block of buffer data into our buffer
template<class T>
void MPI_add_int_block_to_buffer(const HDF5bufferchunk& chunk, const std::string& dset_name, const std::size_t dset_index)
Expand Down
25 changes: 11 additions & 14 deletions Printers/src/printers/hdf5printer/hdf5_combine_tools.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -536,13 +536,9 @@ namespace Gambit
size_tot_l = size_tot + size; // Last?
//}

for (auto it = valids.end()-1; size > 0; --it)
{
if (*it)
break;
else
--size;
}
// Trim any trailing invalid points (size <= valids.size() ensured above)
while (size > 0 && !valids[size-1])
--size;

HDF5::closeSpace(dataspace);
HDF5::closeSpace(dataspace2);
Expand Down Expand Up @@ -804,13 +800,9 @@ namespace Gambit
ranks.push_back(valid_rank);
ptids.push_back(valid_ptid);

for (auto it = valids.end()-1; size > 0; --it)
{
if (*it)
break;
else
--size;
}
// Trim any trailing invalid points (size <= valids.size() ensured above)
while (size > 0 && !valids[size-1])
--size;
aux_sizes.push_back(size);

HDF5::closeSpace(dataspace);
Expand Down Expand Up @@ -992,6 +984,8 @@ namespace Gambit
// Close resources
HDF5::closeDataset(dataset_out);
HDF5::closeDataset(dataset2_out);
H5Tclose(type);
H5Tclose(type2);

// Move offset so that next batch is written to correct place in output file
offset += batch_size_tot + pc_offset;
Expand Down Expand Up @@ -1152,6 +1146,8 @@ namespace Gambit
}
// Create new dataset
setup_hdf5_points(new_group, type, type2, size_tot, *it);
if(type>=0) H5Tclose(type);
if(type2>=0) H5Tclose(type2);
}
// Reopen output datasets for copying
hid_t dataset_out = HDF5::openDataset(new_group, *it);
Expand Down Expand Up @@ -1307,6 +1303,7 @@ namespace Gambit

// Close resources
HDF5::closeDataset(dataset_out);
H5Tclose(type);

// Move offset so that next batch is written to correct place in output file
offset += batch_size_tot + pc_offset;
Expand Down
Loading
Loading