Feat: Native Python FpML CodeList Validation and Resource Distribution (Resolves DSL I-150)#209
Conversation
This commit provides a complete, end-to-end solution for natively validating FpML FpMLCodingSchemeDomain objects in Python by bundling required JSON resources and injecting custom runtime extensions. * Build & Distribution: Updated `get_cdm.sh` to fetch JSON codelists via sparse checkout. Updated `build_cdm.sh` to package these JSON files and Python extensions into the `finos-cdm` wheel artifact using dynamic MANIFEST.in injection. * Runtime Extensions: Established `src/main/python/runtime-extensions/` to cleanly separate custom Python logic from the Java generator source. * Data Loader: Created `load_codelist.py` leveraging `importlib.resources` and `@lru_cache` for dynamic, disk-efficient JSON deserialization. * Bootstrap Injection: Introduced `cdm_runtime.py` to act as the Python Dependency Injection container, automatically binding the custom loader to the generated `LoadCodeList` facade. * Testing: Added `test_codelist_validation.py` to verify the pipeline end-to-end, and updated `.gitignore` for the new resource paths. Resolves: finos#150
|
|
|
|
||
| # Inject LoadCodeList | ||
| if hasattr(cdm_load_codelist_facade, "LoadCodeList"): | ||
| cdm_load_codelist_facade.LoadCodeList.__assign__(impl_load_codelist) # type: ignore |
There was a problem hiding this comment.
Just use
cdm_load_codelist_facade.LoadCodeList = impl_load_codelistAnd also replace the if with try/except.
There was a problem hiding this comment.
Thanks for your suggestion upon the use of try/except instead of hasattr.
Regarding using the standard assignment operator (=), I actually tried that first, but it causes the validation tests to fail with a NotImplementedError (I just re-verified it locally - you will find the error log attached at the end of this answer).
From my debugging, this happens because LoadCodeList is generated as a standalone file, and the Java generator currently only appends the create_module_attr_guardian to _bundle.py. Because the standalone file lacks the module guardian, the = operator just overwrites the attribute in the module's local namespace instead of triggering __assign__ on the underlying FuncProxy.
Since ValidateFpMLCodingSchemeDomain imports it early via from ... import LoadCodeList, it ends up holding onto the original, empty proxy object. Calling __assign__ directly bypasses this by forcing the existing proxy to mutate its internal state, which makes the early-bound references work perfectly.
I can update my code to use the try/except block with __assign__ for now. Let me know if you'd prefer to handle this differently, or if there's a plan to add the module guardian to standalone generated files!
___________________________________________ test_valid_business_center_code ____________________________________________
def test_valid_business_center_code():
> is_valid = cdm_api.ValidateFpMLCodingSchemeDomain("GBLO", "business-center")
# ... [internal framework proxy/validation trace] ...
def rune_execute_native(function_name: str, *args, **kwargs) -> Any:
if function := _NATIVE_REGISTRY.get(function_name):
return function(*args, **kwargs)
available = ", ".join(sorted(_NATIVE_REGISTRY))
> raise NotImplementedError(
f"Function {function_name} doesn't have an implementation! "
f"Available: {available or '<none>'}"
)
E NotImplementedError: Function cdm.base.staticdata.codelist.functions.LoadCodeList doesn't have an implementation! Available: <none>| # Add the LRU Cache. | ||
| # maxsize=15 means it will remember the last 128 requested domains in memory, evicting the least recently used (lcu) ones when the limit is exceeded. | ||
| @lru_cache(maxsize=15) | ||
| def load_codelist(domain: str) -> CodeList: |
There was a problem hiding this comment.
This function has to be in the runtime, otherwise codelists can't be used in any other model but cdm.
There was a problem hiding this comment.
I completely agree with this architectural goal—this utility should definitely be generic so other models can use it. To make it truly generic, I'll also need to parameterize the deserialization step (since it currently hardcodes finos.cdm.base...CodeList).
Question on logistics: Because this requires modifying both repositories, how would you like to coordinate this? Should I open a separate PR in rune-python-runtime first to add the generic utility, wait for that to be merged/released, and then update this PR to consume it?
| logger.setLevel(logging.INFO) | ||
|
|
||
| # Dynamically locate the 'finos' CDM package in the environment. Then navigate to the nested resources folder | ||
| codelist_dir = resource_loader.files("finos").joinpath("resources", "codelist", "json") |
There was a problem hiding this comment.
make this a parameter of the load_codelists and create a partial during the runtime initialisation
There was a problem hiding this comment.
Great call. Hardcoding "finos" inside the loader limits its reusability. I will refactor load_codelist to accept codelist_dir as an argument, and update cdm_runtime.py to use functools.partial to inject the specific path during bootstrap. Thanks for you feedback.
There was a problem hiding this comment.
Hi again, @plamen-neykov ,
I really liked the idea of using functools.partial, but I ran into a strictness issue with FuncProxy when I implemented it.
Because functools.partial does not remove bound arguments from the inspect.signature object (it just assigns them a default value), the signature of the partial object evaluates to (domain, *, codelist_dir=...).
When __assign__ runs, it hits func_proxy.py line 81:
curr_params = inspect.signature(self._func).parameters
new_params = inspect.signature(func).parameters
if curr_params.keys() != new_params.keys(): # This line
raise ValueError(
'Replacement function parameter list do not match the current '
f'parameter list of {str(self._func)}'
)Because ['domain'] != ['domain', 'codelist_dir'], the framework rejects the assignment with a ValueError.
To ensure FuncProxy gets the exact ['domain'] signature it expects while we wait to move the deserializer into the runtime, I used a standard closure (wrapper function) in cdm_runtime.py to bind the directory instead:
# Dynamically load resources inside CDM package in the environment
codelist_dir = resource_loader.files("finos").joinpath("resources", "codelist", "json")
# Partial approach
# from functools import partial
# bound_load_codelist = partial(impl_load_codelist, codelist_dir=codelist_dir)
# Use a standard wrapper instead of 'partial' so the signature is perfectly clean for FuncProxy
def bound_load_codelist(domain: str):
return impl_load_codelist(domain, codelist_dir)
# Inject custom implementations into the CDM Facades. This is the Python equivalent of Guice bindings in Java.
try:
cdm_load_codelist_facade.LoadCodeList.__assign__(bound_load_codelist) # type: ignore
logger.debug("Successfully bound LoadCodeList.")
except AttributeError:
logger.error("Bootstrap Failed: Could not find LoadCodeList in CDM Facade.")Let me know if this works for you, or if you'd prefer I open a PR in rune-python-runtime to make FuncProxy compatible with partial objects!
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Do not do that! The user should be selecting the logging level.
There was a problem hiding this comment.
You are absolutely right, library code shouldn't hijack the application's logging configuration! I will remove logging.basicConfig() and setLevel() so the end-user retains full control over their log output. I'll include this fix in the next commit.
|
|
||
| # Copy CDM resources to the target 'common-domain-model/resources' folder | ||
| mkdir -p "${RESOURCES_DIR}/common-domain-model/codelist/json" | ||
| cp -r rosetta-source/src/main/resources/codelist/json/* "${RESOURCES_DIR}/common-domain-model/codelist/json/" |
There was a problem hiding this comment.
These won't be packaged in the wheel, so a cdm package installed with pip install won't support codelists.
There was a problem hiding this comment.
Actually, they do get packaged into the wheel!
While get_cdm.sh just downloads them locally to the workspace, the actual packaging happens over in build_cdm.sh. If you look at lines 108-114 in build_cdm.sh in this PR, I added a step that dynamically writes a MANIFEST.in file containing recursive-include src/finos/resources * right before the wheel is built.
This successfully forces pip wheel to bundle the JSON files natively inside the .whl artifact, which is why importlib.resources is able to find them at runtime. Let me know if you'd prefer a different packaging mechanism, but this approach has been working perfectly in our sandbox!
There was a problem hiding this comment.
@jserrano-spec A few suggestions that would help us more effectively review the PR.
Start with a design document. Although there's an outline, the details of your approach is still unclear. Please clarify the data flows in a design document that we can review and comment on before further code changes. That design should align to the three main pieces involved in creating a Python version of CDM:
- the Runtime, which provides core Python functionality used by generated code
- the Generator, which interprets a Rune model and emits Python that depends on the Runtime
- the CDM build and deployment process, which uses the Generator to produce Python from CDM's data and functional model
Completeness. Your proposal should call out the impact on all three components and describe how the changes together realize the goals set out in the issue. The proposal should also address the central requirement that code lists can be independently updated without requiring the creation of a new version of the CDM Python artifact.
Generic Technology
- Changes to the Runtime and Generator need to reflect that both are generic technology that can be applied to any Rune codebase.
- Include generic tests: Point Feature/switch implementation #4 of the contribution process requires that any contribution include tests confirming the change works as expected. These become part of the test suite that runs on every PR, which protects the rest of the codebase from regressions. For reference, the CDM tests live in the repo to support development but are not part of this suite. The proposed change must therefore include generic (non-CDM) tests that demonstrate it works as expected.
Cover both success and failure paths. As a rule, the design should address both the success path ("on flow") and failure paths ("off flow"). For example, what happens if the JSON code lists aren't found when the Python library loads?
Use a feature branch for work in progress. PRs against main are expected to fully implement the proposed change, including the tests mentioned above. For works in progress, please use a feature branch and feel free to solicit input from the maintainers along the way. That keeps the eventual review against main focused on a complete, tested change.
Hi @dschwartznyc , @plamen-neykov,
Following up on your request in CDM I-3904 to consolidate our approach, our team decided to contribute our internal sandbox implementation directly to the test suite to establish a scalable pattern for FpML CodeList validation in Python.
1. What This PR Solves
This PR provides a complete, end-to-end solution for Issue DSL I-150 by enabling FpML code list validation natively in Python for in-memory objects. It achieves this by:
2. How to Validate This Implementation
While a standard mvn clean install will build the Java generator, it does not execute the Python-specific test suite. To test this implementation locally and execute test_codelist_validation.py, you must use the Python test runner scripts.
Step 1: Clean Stale Artifacts (Important)
To ensure the script uses the latest compiled Java generator rather than a cached version, delete any existing snapshot JAR first:
Step 2: Initial Build & Test
Run the CDM test script and use the -k flag to keep the virtual environment alive after the tests finish. (The script will automatically rebuild the Java JAR via maven, generate the Python code, bundle the JSONs and runtime extensions, and install the wheel into a fresh .pyenv).
Step 3: Fast Iteration (Direct Pytest Execution)
Once the initial build from Step 2 completes, you do not need to use the bash scripts again. For fast test execution, simply activate the virtual environment and run pytest directly:
3. Architecture & Data Flow Reference
Below is the consolidated data flow requested in CDM I-3904 to provide a comprehensive view of how this implementation works without modifying the core generator logic:
CDM Python Build Process & Distribution: We modified get_cdm.sh to pull rosetta-source/src/main/resources/codelist/json/* via sparse checkout. We then modified build_cdm.sh to copy these JSON files into src/finos/resources and copy the Python extensions into src/finos/runtime_extensions. Finally, we dynamically inject a MANIFEST.in configuration. This instructs the pip wheel builder to package these assets natively inside the .whl artifact.
Data Loading & Runtime Integration: Our custom loader (load_codelist.py) utilizes Python's importlib.resources to locate the JSON files inside the installed package. It uses a regex match on the requested domain parameter to find the correct file, reads the payload, and deserializes it using CodeList.rune_deserialize. To prevent redundant disk I/O, it is wrapped in an @lru_cache.
Generator & Validation Flow: We did not need to make any changes to the rune-python-generator. Because the logic for ValidateFpMLCodingSchemeDomain is now natively generated, our bootstrap script (finos.runtime_extensions.cdm_runtime) simply binds our loader to the LoadCodeList proxy API. The natively generated validation function executes flawlessly behind the scenes.
CDM's Serialization Format: Our current contribution is strictly focused on enabling runtime data validation for in-memory objects. Consequently, we have not explored modifications to the CDM serialization format to append external list version references when saving documents. We consider this outside the scope of this initial runtime-loading implementation.