More lenient float multipleOf validation#878
More lenient float multipleOf validation#878tdamsma wants to merge 1 commit intopython-jsonschema:mainfrom
Conversation
|
Hi. Thanks for this but I'm going to politely decline. As I mentioned in the other issues, if someone wants non-float behavior, there is already a way to get it via Decimals. Otherwise, this change just moves which floats are incorrect multiples from some to others. If you indeed are interested in this, I'd point you to this PR which claims there is a better algorithm entirely for us to use. I don't really think that looking at the number of tickets filed is a good metric for surprisability (and I'm not sure I believe surprise is the right metric at all in this case, this library implements a spec, so for better or worse it's compliance with the spec that trumps). But just looking at issues filed is going to be biased against the non-current behavior -- people who expect this behavior aren't filing tickets saying things worked as expected :) But do appreciate the PR nonetheless. |
|
@Julian, I understand and sympathize with your points, but am going to try to convince you otherwise nonetheless. I see now that you have been involved in discussions on the jsonschema spec itself, without reaching a meaningful course of action. I just see many valid points raised, linked issues, calls for clarification of the spec, other languages sturggling with this etc. My take is that it is up to the language to interpret/implement this as they want (correct me if I'm wrong here). The spec offers no real guidance here (and whilest I think that should be improved, that is another discussion), for the python implementation I think python's best practices should be guiding. And that is where it becomes a bit unclear I guess. On first sight, python works like this: >>> 0.1*round(10.1/0.1)
10.100000000000001
>>> 0.1*(10.1/0.1)
10.1And it seems fair to reason that 10.1 is indeed not an integer multiple of 0.1. But digging a bit deeper, both python itself and the native json library already do some implicit rounding: >>> import json
>>> v = json.loads("0.100000000000000001")
>>> print(v)
0.1
>>> print(f"{v:.20g}")
0.10000000000000000555
>>> print(json.dumps(v))
0.1There are three occurences of implicit rounding here:
So the underlying philosophy seems (to a certain degree) to be to treat float(0.1) as exactly 1/10. So even though the behaviour below is strict/correct/explainable: This is also the case: If I really cared, I should have used a >>> import simplejson
>>> import jsonschema
>>> jsonschema.validate(instance=simplejson.loads("10.00000000000000005",use_decimal=True), schema={"multipleOf": 1})
Failed validating 'multipleOf' in schema:
{'multipleOf': 1}
On instance:
Decimal('10.00000000000000005')So whilest you have argued that is you want more correct behaviour of multipleOf one should use Decimal and not float, I woul like to argue that I think it is more pythonic to silently add some leniency to multipleOf validation (equal to the float representation error) just like python displays
Not really, it just checks (or should check) if the number is equal to the nearest representable float. Now there are many different ways to implement this, and perhaps one based on Decimal would feel less arbitrary: If the users intention actually is to use arbitrary precision decimal representaion they can do so by using the Decimal package explicitly, something they should be doing anyways as this holds: >>> Decimal(1) / Decimal(0.1)
Decimal('9.999999999999999444888487687')
>>> 1 / 0.1
10.0Which in the current implementation actually leads to the arbitrary behaviour where some numbers are multiples of some irrepresentable numbers and others are not. |
|
Apologies for not having more time to respond in detail, but there's no implicit rounding in your examples -- what Python does in recent versions is a very targeted, very specific display change -- https://bugs.python.org/issue1580 It reprs floats by using the shortest input that produces the given float. But no rounding is happening, 0.1 isn't a representable float, so you always have gotten the same float, before or after the repr change. And absolutely no tolerance is added -- it's strictly a display change. There's as far as I know no precedence anywhere in Python for introducing some implicit tolerance when doing float operations unless the user explicitly asks for it, and what |
c7257e92 refOfUnknownKeyword.json rm unused $id (#890) bf54c9bd support negative multiples (#888) e819f329 Fix broken test in v1/ref.json from PR #873 29965c01 Merge pull request #873 from jdesrosiers/fix-v1-meta-schema-ref 3b08f79b test: add some missing edge cases to json-pointer format tests (#877) a1de34ae test(format): allow consecutive hyphens in hostname (RFC1123) (#874) aac8a1bd chore: remove duplicate idn-email test and fix ambiguous test descriptions (#881) 78048ead format: add RFC 3339 unknown local offset (-00:00) test for time (#878) ee3fc3d8 test(draft2020-12): add boundary coverage for maxContains = 0 (#857) 54ed4d1f Add additional URI edge cases and include them in v1 (#850) 87f09d65 Add missing email format edge cases across drafts (#849) f2720d32 more tests for the duration format: missing units (#875) e391238a Draft-06/7 don't allow empty enums (#876) c7fc5095 Add tests for empty enum validation (#870) 20f4897c docs: clarify intent of format-assertion vocabulary test cases (#854) eb018552 Add annotation tests for $dynamicRef and $dynamicAnchor (#862) 0a1dc5e9 fix: remove duplicate allOf and maxItems entries in draft2020-12 KNOWN set (#865) 06e3102d docs: fix typos and wording in suite documentation (#867) 031e9e9a test(duration): expand RFC 3339 grammar coverage for duration format (#868) 3181cefa Fix issues with v1 tests 06481b14 test: add unicode pattern and patternProperties tests for draft2020-12 (#837) 601aa708 Add exhaustive ABNF-driven tests with RFC 2673 compliance for ipv4 format (#840) 583d7c62 Merge pull request #851 from Anshikakalpana/fix/maximum-test-description-mismatch 61bfe27a fix: apply description fix to v1 maximum test 516d71f7 fix: align test description with actual valid field in maximum.json across drafts 4253477c Merge pull request #836 from VIDIT45AGARWAL/date-time 9e787e38 Add tests for out-of-range time components in date-time format bce6a47c Merge pull request #793 from 414owen/os/fix-unevaluatedproperties-test-case a17eb1f6 Merge pull request #833 from JulesGosnell/add-m3-validator 648811b2 Add M3 JSON Schema validator to Who Uses section 32fa6e28 Merge pull request #827 from Shristibot/required-null 086bf9a1 Revert unintended .gitignore change 909b58d7 Remove unrelated files from PR c957e914 Add boolean and null cases to draft-04 , draft-06 and v1 0650027c Add boolean case to required non-object validation d8d15620 Reset README to upstream to avoid unrelated changes f6523449 Add required + null test case for issue #821 75995a1c Merge pull request #818 from fperrad/lua-schema 7e6b3a15 Add lua-schema in README.md 8b826d6b Merge pull request #816 from kpavlov/kotlinx-schema-link b3ae21c2 Update README.md 646f11da Merge pull request #812 from Shristibot/clarification-on-test-suite ba3f9505 Update README.md 3fc54179 Merge pull request #808 from json-schema-org/ether/more-fix-remotes 3550ca73 clarify optional tests and document integer example fac4f6f6 Clarify that the test-suite is not astyle guide ece5bd10 Merge pull request #804 from Shristibot/add-if-then-else-tests 525adc90 move aside remotes that are not valid in earlier drafts bbe22f88 remove redundant remotes that also exist in draft-specific directories ea0baf8d these files are no longer used, and contain an unresolveable $ref on earlier drafts bef1e2ed Remove redundant then/else false test cases 910af732 Merge pull request #803 from json-schema-org/gregsdennis/proposals 23ff2750 Fix incorrect test condition, simplify test, and copy tests to all drafts as requested ccfb91af Fix description style and add false-handling tests for issue #767 bc1763ed Fix description formatting in false-handling tests bd13fcf7 Add if/then/else test cases for issue #767 dc659514 added documentation about folder in root readme and folder readme c4a99e00 move propertyDependencies tests to proposals folder d69537ac Merge pull request #802 from json-schema-org/ether/draft4-boolean-schema 79ee846d Merge pull request #801 from json-schema-org/ether/missing-not-tests 2da8f9a6 remove unused remotes files 15cc161f add missing not tests from #714 47ff61c2 Merge pull request #799 from frawa/frawa-patch-1 4a640206 Merge pull request #800 from frawa/frawa-patch-2 fdabd3f9 Merge pull request #795 from anmolbhadoriya5849/fix/issue-768-additionalItems-case cd02dece Add Unison section to README 4da52d58 Remove 'typed-json' link from README 232f0795 Revert removal of additionalProperties applicator tests in v1 b4032b96 Revert removal of additionalProperties applicator tests in draft2020-12 df833be4 Remove test case for additionalProperties behavior 0d54c828 Remove test for additionalProperties behavior 756eebd6 Remove valid case for additionalItems validation a3feff79 Remove valid case for additionalItems validation 8ba96359 Remove test for additionalItems applicators f3488882 Remove valid test case for additionalItems da4a5684 Remove valid case for additionalItems validation a247442b Merge pull request #794 from json-schema-org/ether/dynamic-scopes-subschemas 19810ec1 Add a test for respecting dynamic scopes while avoiding the root of each schema e0e431f8 Fix unevaluatedProperties & additionalProperties test case 8da0a7bc Merge pull request #792 from jdesrosiers/more-v1-transistion 94c0b0e1 Fix more "draft/next" occurances that we missed 8fd4fcec Merge pull request #786 from jdesrosiers/a-label-tests be58fa98 Merge pull request #791 from json-schema-org/ether/fix-draft-next-removal acaece38 stop using the term "draft" to mean "version", and fix the remaining mentions of draft-next (now v1) 980e1052 Merge pull request #790 from dylankerr-bis/add-test-sibling-nested-ref-id b5037ed7 Add test for sibling $ref and $id in nested schema to v1 suite b8bb8585 Add more comprehensive tessts for A-labels in the "hostname" format 9b8ef536 Cleanup hostname tests (remove dups, fix descriptions, reorder) 4cf55996 Merge pull request #776 from davishmcclurg/idn-hostname-separators a930db43 Shorten test descriptions to appease ci aab08752 Updates form PR feedback 081a16a7 Test IDN label separators separate labels 658c8cf9 Test IDN label separator in `hostname` format 38c04b83 Test label separator position in hostname formats cabbd65c Merge pull request #735 from zaplapl/main ed74ee74 Add unicode tests to all the dialects fd0df6cf Update unicode tests per PR feedback 60098220 base test implementation for const equality for strings based on evalutaion by individual codepoint ef78a0eb Merge pull request #683 from mwadams/main db365735 Added tests to encoded refs to unknown keywords. 59944b7e Add test for sibling $ref and $id in nested schema to 2019 suite 2196a531 Add test for sibling $ref and $id in nested schema to 2020 suite e99b24c9 Merge pull request #788 from jdesrosiers/next-to-v1 7eac79be Bring tests up-to-date with format changes ae27be13 Bring dynamic reference tests up to date with spec changes 79dc2fd3 Unknown optional vocabs can't be ignored anymore 51f84640 Change draft-next to v1 f4e41b0c Merge pull request #785 from jdesrosiers/uneval-nested-conflicts 60a2633f Merge pull request #778 from koplas/extend-uri-validation-tests bf65ef20 Copy uri tests to other drafts 02a0d89f Merge pull request #743 from Vinit-Pandit/MeastroZI-patch-1 c1545e6c Update unevaluatedProperties.json 7e970016 Merge pull request #764 from Dragonsangel/date_time_validation 8970b1ca Add tests for evaluated property/item nesting conflicts 677a9399 Merge pull request #784 from sangamon/781-date-time-format-extended-year 9672a5b6 Merge pull request #783 from sangamon/780-uuid-format-shifted-dashes 28e49b14 add test for invalid "shifted" dashes in uuid format (#780) aecf038c add test for invalid extended year in date-time format (#781) 15e4505b Merge pull request #765 from JDepooter/unevaluated_items_with_mincontains_0 b8611c40 Extend URI and URI-reference tests 48461fc3 Merge pull request #775 from jviotti/annotations-content-id 9f7dce32 Merge pull request #766 from json-schema-org/ether/bare-email a1e077d4 ensure we do not parse email strings permissively 72c670b0 Delete identifiers in Content annotation tests 675186ba Take into account `$id` for annotation schema location assertions 100e9823 Add tests for unevaluatedItems with minContains = 0 0be5549b Added tests to ensure date and time formats don't accept date-time formats git-subtree-dir: json git-subtree-split: c7257e92580678a086f0b9243a1903ed88bd27f7
Many user have ran into floating point precision errors when validation multipleOf. E.g. #818, #810, #687, #185, #320, #247. Technically it can be argued that it is silly to check if any number is an integer multiple of e.g. 0.1. Ask silly questions, get silly answers. This approach is a bit unhelpful though.
Looking at other implementations in javascript and python, other libraries struggle with this too
Checking if 10.1 is a multiple of 0.1 yields mixed results:
https://jsonschema.dev/ fails validation
https://www.jsonschemavalidator.net/ accepts it
And react-jsonschema-form accepts it.
So there is no real consensus amongst the libraries. Still I find the number of issues a strong indication that the current behaviour of this library is unexpected. This PR's modifies the multipleOf behaviour to allow for float tolerance (epsilon) to be taken into account.