Skip to content

fix_segfault_when_gc_finalizing_refined_models#101

Merged
amartinhuertas merged 2 commits into
mainfrom
fix_segfault_when_gc_finalizing_refined_models
Jun 23, 2026
Merged

fix_segfault_when_gc_finalizing_refined_models#101
amartinhuertas merged 2 commits into
mainfrom
fix_segfault_when_gc_finalizing_refined_models

Conversation

@amartinhuertas

Copy link
Copy Markdown
Member

OctreeDistributedDiscreteModels.jl

  • New PXestConnectivityRef mutable struct — explicit reference-counted wrapper for ptr_pXest_connectivity, with retain! / release! methods that free the connectivity only when the last holder is done.

  • Replaced the owns_ptr_pXest_connectivity::Bool and gc_ref::Any fields on OctreeDistributedDiscreteModel with a single connectivity_ref::Any field.

  • Updated inner and outer constructors to accept connectivity_ref instead of the old owns_ptr_pXest_connectivity, gc_ref pair.

  • All call sites updated: true/false, model/nothing replaced by a PXestConnectivityRef instance or model.connectivity_ref.

  • octree_distributed_discrete_model_free! — removed manual pXest_connectivity_destroy calls guarded by owns_ptr_pXest_connectivity; connectivity lifetime is now fully delegated to PXestConnectivityRef.

AnisotropicallyAdapted3DDistributedDiscreteModels.jl

Four call sites updated in the same way: connectivity_ref now threaded through AnisotropicallyAdapted3DDistributedDiscreteModel, vertically_adapt, vertically_uniformly_refine, and horizontally_adapt.

* New PXestConnectivityRef mutable struct — explicit reference-counted wrapper for ptr_pXest_connectivity, with retain! / release! methods that free the connectivity only when the last holder is done.

* Replaced the owns_ptr_pXest_connectivity::Bool and gc_ref::Any fields on OctreeDistributedDiscreteModel with a single connectivity_ref::Any field.

* Updated inner and outer constructors to accept connectivity_ref instead of the old owns_ptr_pXest_connectivity, gc_ref pair.

* All call sites updated: true/false, model/nothing replaced by a PXestConnectivityRef instance or model.connectivity_ref.

* octree_distributed_discrete_model_free! — removed manual pXest_connectivity_destroy calls guarded by owns_ptr_pXest_connectivity; connectivity lifetime is now fully delegated to PXestConnectivityRef.

AnisotropicallyAdapted3DDistributedDiscreteModels.jl

Four call sites updated in the same way: connectivity_ref now threaded through AnisotropicallyAdapted3DDistributedDiscreteModel, vertically_adapt, vertically_uniformly_refine, and horizontally_adapt.
@amartinhuertas

Copy link
Copy Markdown
Member Author

For reference, this PR attempts to solve a long-standing issue we have had in GridapP4est related to the automatic deallocation of octree models by Julia's Garbage Collector. In particular, from time to time, calling the Finalize method of an OctreeDistributedDiscreteModel caused a segmentation fault; see stacktrace below.

[22731] signal (11.1): Segmentation fault
in expression starting at /home/u1134396/git-repos/GridapP4est.jl/test/mpi/OctreeDistributedDiscreteModelsTests.jl:9
free at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
sc_array_reset at /tmp/p4est-2.3.6/sc/src/sc_containers.c:167
p4est_destroy at /tmp/p4est-2.3.6/src/p4est.c:522
p4est_destroy at /home/u1134396/.julia/packages/P4est_wrapper/J31xc/src/bindings/p4est_api.jl:22 [inlined]
pXest_destroy at /home/u1134396/git-repos/GridapP4est.jl/src/PXestTypeMethods.jl:14 [inlined]
octree_distributed_discrete_model_free! at /home/u1134396/git-repos/GridapP4est.jl/src/OctreeDistributedDiscreteModels.jl:494
Finalize at /home/u1134396/git-repos/GridapP4est.jl/src/OctreeDistributedDiscreteModels.jl:507

As far as I understand, the segmentation fault arises whenever we call p4est_destroy on a forest that was generated from applying refinement/coarsening to another forest. This latter forest shares the p4est_connectivity_t object with the forest it was generated from. This was causing a seg fault, sometimes, but not always, whenever the former forest was deallocated before the latter.

The rationale behind the previous solution, based on the gc_ref field, was precisely to avoid this circumstance by forcing an error of deallocation inverse to the one of allocation. However, i have checked with an MWE that, whenever all forests gets out of scope at the same time, even if they are related via a reference chain, then the Garbage Collection does not actually fulfill an order of deallocation inverse to the one of deallocation.

The solution in this PR performs a sort of manual reference counter tracking, and deallocates the shared p4est_connectivity_t object whenever the reference counter reaches zero, regardless of the order of deallocation.

@amartinhuertas amartinhuertas merged commit 2a15018 into main Jun 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant