Skip to content

Proposed Fix: Unpicklable BidimensionalMeasure in 3.x #116

@maxfenv

Description

@maxfenv

Hi,

The implementation of BidimensionalMeasure in the 3.x set of releases is currently not (un) pickleable. Attempting to do so results in an infinite recursion. What follows is a minimal reproducible example:

import pickle

from measurement.base import MeasureBase, BidimensionalMeasure

class Mass(MeasureBase):
    STANDARD_UNIT = "g"
    UNITS = {"g": 1}
    SI_UNITS = ["g"]

class Area(MeasureBase):
    STANDARD_UNIT = "sq_m"
    UNITS = {"sq_m": 1}

class Yield(BidimensionalMeasure):
    PRIMARY_DIMENSION = Mass
    REFERENCE_DIMENSION = Area

y = Yield(g__sq_m=1)
pickled = pickle.dumps(y)
pickle.loads(pickled)  # Infinite recursion

The fundamental reason this happens, is because on unpickling, pickle will first look for a __setstate__ magic method. It doesn't find this, because BidimensionalMeasure doesn't implement it, so it tries __getattr__ instead (I'm not being very precise, I think this is all just what python does when you call object.attribute).

The BidimensionalMeasure.__getattr__ looks like:

    def __getattr__(self, measure_string):
        primary_units = self.PRIMARY_DIMENSION.get_units()
        reference_units = self.REFERENCE_DIMENSION.get_units()

        p1, r1 = self.primary.unit, self.reference.unit
        p2, r2 = self._get_unit_parts(measure_string)

        primary_chg = primary_units[p2]/primary_units[p1]
        reference_chg = reference_units[r2]/reference_units[r1]

        return self.primary.value / primary_chg * reference_chg

however at this stage the object is completely un-initialised, and self.pimary is not yet set! As a consequence when execution here hits p1, r1 = self.primary.unit, self.reference.unit, it ends up calling __getattr__ again, leading to infinite recursion and a RecursionError.

The fix here, is to raise an AttributeError as early as possible if measure_string is known to not be one of the valid measures. This can be done by simply moving the call to self._get_unit_parts to the top, since that raises an AttributeError on nonsense units.

A full reproducible example, for a picklable version of BidimensionalMeasure is as follows:

import pickle

from measurement.base import MeasureBase, BidimensionalMeasure


class Mass(MeasureBase):
    STANDARD_UNIT = "g"
    UNITS = {"g": 1}
    SI_UNITS = ["g"]


class Area(MeasureBase):
    STANDARD_UNIT = "sq_m"
    UNITS = {"sq_m": 1}


class PicklableBidimensionalMeasure(BidimensionalMeasure):
    """A pickleable BidimensionalMeasure"""
    def __getattr__(self, measure_string):
        """Pickle friendly implementation of BidimensionalMesaure.__getattr__

        Checks whether or not the passed argument is a measure, and raises
        an AttributeError immediately.  Otherwise when referencing `self.primary`
        during pickle unserialisation, because the object is not initialised yet,
        `primary` isn't yet set and it ends up recursing forever.
        """
        p2, r2 = self._get_unit_parts(measure_string)  # Raises AttributeError
        primary_units = self.PRIMARY_DIMENSION.get_units()
        reference_units = self.REFERENCE_DIMENSION.get_units()
        p1, r1 = self.primary.unit, self.reference.unit  # Would recurse on unpickle
        primary_chg = primary_units[p2] / primary_units[p1]
        reference_chg = reference_units[r2] / reference_units[r1]
        return self.primary.value / primary_chg * reference_chg


class Yield(PicklableBidimensionalMeasure):
    PRIMARY_DIMENSION = Mass
    REFERENCE_DIMENSION = Area

y = Yield(g__sq_m=1)
pickled = pickle.dumps(y)
pickle.loads(pickled)  # Works fine

Sorry for the lack of vertical whitespace, but this way it's copy pastable into a vanilla python shell.

If this all seems contrived.... we ended up here because we're using the django_redis cache backend to Django, the default serialiser for which uses pickle to serialise all cached objects.

Let me know if you'd like me to raise a PR for this. I haven't because, it looks like you are working on a 4.0 release which gets rid of the BidimensionalMeasure anyway. At minimum, I hope that others with the same issue can implement this easily enough in their projects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions