-
Notifications
You must be signed in to change notification settings - Fork 103
#864 development constant tail bug #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
affade3
3d24102
f51a464
7169d85
5ecd5a5
70f1dab
2f4e5e3
904c72f
3c24edf
52020e4
cffdd7b
2aea757
a49528a
3c4961d
0ec5dad
fc15005
4176856
814fd16
e70c59d
69252a0
6651cd2
8e06715
483752a
7a9c571
b0f47f4
7e9945f
6f28c21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,10 +3,12 @@ | |
| # file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
| from chainladder.development.base import DevelopmentBase | ||
| import pandas as pd | ||
| import numpy as np | ||
| import warnings | ||
|
|
||
|
|
||
| class DevelopmentConstant(DevelopmentBase): | ||
| """ A Estimator that allows for including of external patterns into a | ||
| """A Estimator that allows for including of external patterns into a | ||
| Development style model. When this estimator is fit against a triangle, | ||
| only the grain of the existing triangle is retained. | ||
|
|
||
|
|
@@ -39,56 +41,153 @@ def __init__(self, patterns=None, style="ldf", callable_axis=0, groupby=None): | |
| self.callable_axis = callable_axis | ||
| self.groupby = groupby | ||
|
|
||
| def _prepare_cdf_patterns(self, patterns, n_dev_periods): | ||
|
|
||
| patterns = dict(patterns) | ||
|
|
||
| sorted_keys = sorted(patterns.keys()) | ||
| pattern_values = np.array([float(patterns[k]) for k in sorted_keys]) | ||
|
|
||
| # convert ldfs to cdfs; cdf patterns are used as-is | ||
| if self.style == "ldf": | ||
| cdf_values = np.cumprod(pattern_values[::-1])[::-1] | ||
| else: | ||
| cdf_values = pattern_values | ||
|
|
||
| cdf_patterns = {int(k): float(v) for k, v in zip(sorted_keys, cdf_values)} | ||
|
|
||
| # patterns that fit within the triangle have no tail | ||
| if len(cdf_patterns) <= n_dev_periods: | ||
| return cdf_patterns, 1.0 | ||
|
|
||
| # separate the tail factor and rebase the remaining cdfs onto it | ||
| tail_cdf = cdf_patterns[int(sorted_keys[n_dev_periods])] | ||
| for k in sorted_keys[:n_dev_periods]: | ||
| cdf_patterns[int(k)] /= tail_cdf | ||
|
|
||
| return cdf_patterns, tail_cdf | ||
|
|
||
| def fit(self, X, y=None, sample_weight=None): | ||
| """Fit the model with X. | ||
| Parameters | ||
| ---------- | ||
| X : Triangle-like | ||
| Set of LDFs to which the munich adjustment will be applied. | ||
| Set of LDFs to which the munich adjustment will be applied. | ||
| y : Ignored | ||
| sample_weight : Ignored | ||
| Returns | ||
| ------- | ||
| self : object | ||
| Returns the instance itself. | ||
| Returns the instance itself. | ||
| """ | ||
| from chainladder import options | ||
| if X.is_cumulative == False: | ||
|
|
||
| # convert to cumulative triangle | ||
| if not X.is_cumulative: | ||
| obj = self._set_fit_groups(X).incr_to_cum().val_to_dev().copy() | ||
| else: | ||
| obj = self._set_fit_groups(X).val_to_dev().copy() | ||
|
Comment on lines
+86
to
89
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is mostly the same. |
||
|
|
||
| xp = obj.get_array_module() | ||
| obj = obj.iloc[..., :1, :-1]*0+1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think this is the only line you have to change. if self.style == "cdf":
obj = obj.iloc[..., :1, :]*0+1
else:
obj = obj.iloc[..., :1, :-1]*0+1
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nah, that's the lazy way. You aren't going to catch all the edge cases. Just bring my tests in and try your code. You'll fail a bunch.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
And if it's LDF style, then you don't? This is wrong.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try to bring my tests in, and you can see if you can catch all the edge cases more cleanly. I'm sure there's a way. I think for this PR, reviewing the tests is actually more important than the code itself.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it definitely is the lazy way. it's also done in this PR, literally just a few lines down
good idea. i can do that
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh and if you want to try to use the tests and let AI solve it, you can give that a try if you have access to a good AI agent. Cursor couldn't do it and just kept iterating itself until I killed it.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ha, it's very possible I made mistakes on the tests. So let's make sure the tests are right. Your implementation is much cleaner, let's just make sure it can catch all the edge cases. I think we are super close.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think each test needs to test both the ldf_ and the cdf_. we'd just have to manually calculate the cdf/ldf from the supplied vector. |
||
| tri_dev_periods = len(obj.ddims) | ||
|
|
||
| if callable(self.patterns): | ||
| # on index | ||
| if self.callable_axis == 0: | ||
| ldf = obj.index.apply(self.patterns, axis=1) | ||
| ldf = ( | ||
| pd.concat(ldf.apply(pd.DataFrame, index=[0]).values, axis=0) | ||
| .fillna(1)[obj.ddims].values) | ||
| ldf = xp.array(ldf[:, None, None, :]) | ||
| rows = obj.index | ||
| # on columns | ||
| elif self.callable_axis == 1: | ||
| ldf = obj.columns.to_frame(index=False).apply(self.patterns, axis=1) | ||
| ldf = ( | ||
| pd.concat(ldf.apply(pd.DataFrame, index=[0]).values, axis=0) | ||
| .fillna(1)[obj.ddims].values) | ||
| ldf = xp.array(ldf[None, :, None, :]) | ||
| rows = obj.columns.to_frame(index=False) | ||
| else: | ||
| raise ValueError("callable axis needs to be 0 or 1") | ||
|
|
||
| patterns = self.patterns(rows.iloc[0]) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Callable path determines shape from first row onlyLow Severity When Additional Locations (1)Reviewed by Cursor Bugbot for commit 6f28c21. Configure here. |
||
| else: | ||
| # force the patterns to a dictionary | ||
| patterns = dict(self.patterns) | ||
|
Comment on lines
94
to
+107
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the right side, most cases is in the else case, where we are not working on the axis, it's just making sure that the patterns is in dict form. In the callable case, I left most of it alone except I renamed to |
||
|
|
||
| # separate the cdf patterns from the tail; _prepare_cdf_patterns already | ||
| # returns tail_cdf=1 when the patterns do not extend past the triangle. | ||
| cdf_patterns, tail_cdf = self._prepare_cdf_patterns(patterns, tri_dev_periods) | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the most important part of the function, is to make sure I get the pattern is CDF form, and separate out the tail. See the function above. |
||
| pattern_dev_periods = len(cdf_patterns) | ||
|
|
||
| # determine whether to include the last development period in the patterns | ||
| if pattern_dev_periods < tri_dev_periods: | ||
| warnings.warn( | ||
| "Supplied patterns are shorter than the triangle development " | ||
| "periods. Missing ages will be filled with a factor of 1.0.", | ||
| UserWarning, | ||
| stacklevel=2, | ||
| ) | ||
| include_last = False | ||
| elif pattern_dev_periods == tri_dev_periods: | ||
| include_last = True | ||
| else: | ||
| include_last = tail_cdf != 1 | ||
|
|
||
| dev_slice = slice(None) if include_last else slice(None, -1) | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is to slice that development, depends on if we drop the last factor (e.g. 120-ult). |
||
|
|
||
| # this is the object to fill out the patterns, skeleton frame | ||
| obj = obj.iloc[..., :1, dev_slice] * 0 + 1 | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now, we are back to the old method, obj is just the blank age to age (NOT age to ult) to fill out. |
||
|
|
||
| if callable(self.patterns): | ||
|
|
||
| def _callable_row(row): | ||
| raw_patterns = self.patterns(row) | ||
| cdf_row, row_tail_cdf = self._prepare_cdf_patterns( | ||
| raw_patterns, tri_dev_periods | ||
| ) | ||
| fit_row = raw_patterns if self.style == "ldf" else cdf_row | ||
| return dict(fit_row), row_tail_cdf | ||
|
Comment on lines
+135
to
+141
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
|
||
| prepared = rows.apply(_callable_row, axis=1) | ||
| ldf = ( | ||
| pd.concat( | ||
| [pd.DataFrame(item[0], index=[0]) for item in prepared], | ||
| axis=0, | ||
| ) | ||
| .fillna(1)[obj.ddims] | ||
| .values | ||
| ) | ||
| tail_cdfs = xp.array([item[1] for item in prepared]) | ||
|
|
||
| if self.callable_axis == 0: | ||
| ldf = xp.array(ldf[:, None, None, :]) | ||
| tail_cdfs = tail_cdfs[:, None, None] | ||
| else: | ||
| raise ValueError('callable axis needs to be 0 or 1') | ||
| ldf = xp.array(ldf[None, :, None, :]) | ||
| tail_cdfs = tail_cdfs[None, :, None] | ||
|
Comment on lines
+144
to
+159
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is similar to the old stuff, I didn't touch. |
||
|
|
||
| else: | ||
| ldf = xp.array([float(self.patterns[item]) for item in obj.ddims]) | ||
| fit_patterns = patterns if self.style == "ldf" else cdf_patterns | ||
|
|
||
| # fill any triangle ages missing from the patterns with a factor of 1.0 | ||
| for ddim in obj.ddims: | ||
| if not any(ddim == k or int(ddim) == int(k) for k in fit_patterns): | ||
| fit_patterns[int(ddim)] = 1.0 | ||
|
|
||
| ldf = xp.array([float(fit_patterns[int(item)]) for item in obj.ddims]) | ||
| ldf = ldf[None, None, None, :] | ||
|
Comment on lines
+169
to
170
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as before. |
||
| tail_cdfs = tail_cdf | ||
|
|
||
| if self.style == "cdf": | ||
| ldf = xp.concatenate((ldf[..., :-1] / ldf[..., 1:], ldf[..., -1:]), -1) | ||
|
|
||
| # apply tail_cdf to the last ldfs of the triangle | ||
| ldf[..., -1] = ldf[..., -1] * tail_cdfs | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is after following through everything, and just need to multiply the last element with the tail. |
||
|
|
||
| obj = obj * ldf | ||
| obj._set_slicers() | ||
|
|
||
| self.ldf_ = obj | ||
| self.ldf_.is_pattern = True | ||
| self.ldf_.is_cumulative = False | ||
| self.ldf_.valuation_date = pd.to_datetime(options.ULT_VAL) | ||
|
|
||
| return self | ||
|
|
||
| def transform(self, X): | ||
| """ If X and self are of different shapes, align self to X, else | ||
| """If X and self are of different shapes, align self to X, else | ||
| return self. | ||
|
|
||
| Parameters | ||
|
|
||


Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugbot suggested this, but basically to sort the keys and make sure that the pattern supplied is reordered. It's a good suggestion, but I don't think a user would intentionally supplied an unordered pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this. by default, cdfs in a series is actually stored from oldest to newest.