Skip to content

[FIX] JSON.dump for hash with duplicate key#564

Closed
maniSHarma7575 wants to merge 1 commit into
ruby:masterfrom
maniSHarma7575:563-fix-json-dump-for-hash-with-duplicate-keys
Closed

[FIX] JSON.dump for hash with duplicate key#564
maniSHarma7575 wants to merge 1 commit into
ruby:masterfrom
maniSHarma7575:563-fix-json-dump-for-hash-with-duplicate-keys

Conversation

@maniSHarma7575

Copy link
Copy Markdown

Motivation / Background

This Pull Request has been created because to fix #563

The output for the hash is inconsistent when using JSON.dump with the duplicate key. Specifically, if we examine the result below, the key 'a' appears twice in the output JSON for the same key 'a'.

For example:

JSON.dump({ 'a' => 1, :a => 2 })

Actual Result:

"{\"a\":1,\"a\":2}"

Expected result:

"{\"a\":2}"

cc: @byroot @hsbt

@maniSHarma7575

maniSHarma7575 commented Jan 7, 2024

Copy link
Copy Markdown
Author

@hsbt , did you get a chance to check this out?

cc: @byroot

@hsbt

hsbt commented Jan 11, 2024

Copy link
Copy Markdown
Member

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

@maniSHarma7575

maniSHarma7575 commented Jan 11, 2024

Copy link
Copy Markdown
Author

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}

Please let me know if my understanding is wrong.

@hsbt , if the colon (:) syntax is used to create symbols as keys, the output matches the expected result:

ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.6.3
{"a":2}

ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.7.1
{"a":2}

However, when the rocket (=>) syntax is used to associate keys with values in the hash, the JSON output contains duplicate values.

Therefore, the output should be consistent regardless of the syntax used to create the hash.

@byroot

byroot commented Jan 11, 2024

Copy link
Copy Markdown
Member

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

@byroot

byroot commented Jan 11, 2024

Copy link
Copy Markdown
Member

I'm not sure why expected behavior is correct. Because result of JSON.dump is not changed from old version.

@hsbt he's not trying to fix a regression, but do prevent duplicated keys in the output. Arguably it's a minor bug, but it's true that it's not really expected for a JSON library to allow this.

As you know JSON has multiple standards, most don't say anything about that, but RFC 8259 says keys SHOULD be unique.

And more generally it's unclear how parsers will handle it, some may keep the first value, some the last, etc.

@maniSHarma7575

Copy link
Copy Markdown
Author

@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.

However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.

@byroot , it's functioning well for all three alternative implementations, as I've implemented the fix in lib/json/common.rb.

https://github.com/flori/json/blob/78643245d2b60f770946e2e252890655d61737fc/lib/json/common.rb#L614

For all the 3 implementations I have tested it:

  • For Java
ruby -v
jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f Java HotSpot(TM) 64-Bit Server VM 25.391-b13 on 1.8.0_391-b13 +jit [arm64-darwin]
rake build
gem build json-java.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2-java.gem
mkdir -p pkg
mv json-2.7.2-java.gem pkg

Created a script json.rb

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

  • For C
ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
rake build
Resolving dependencies...
(in /Users/manishsharma/Development/personal/opensource/json)
Writing version information for 2.7.2
gem build json.gemspec
  Successfully built RubyGem
  Name: json
  Version: 2.7.2
  File: json-2.7.2.gem
gem build json_pure.gemspec
  Successfully built RubyGem
  Name: json_pure
  Version: 2.7.2
  File: json_pure-2.7.2.gem
mkdir -p pkg
mv json-2.7.2.gem pkg
mv json_pure-2.7.2.gem pkg
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

  • For Ruby
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "json_pure", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end

require 'json/pure'

pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"

@maniSHarma7575

maniSHarma7575 commented Jan 16, 2024

Copy link
Copy Markdown
Author

@hsbt and @byroot Waiting for your response.

@maniSHarma7575

Copy link
Copy Markdown
Author

Hi @hsbt Could you please review this?

@maniSHarma7575

Copy link
Copy Markdown
Author

@byroot Waiting for the response on this?

cc: @hsbt

@jhawthorn

Copy link
Copy Markdown
Member

I don't think JSON should perform this validation. It will hurt performance a lot and it's not something we can fully guarantee to users.

This may cover simple cases but not, for example, differently encoded strings:

> JSON.dump({"foo".encode("UTF-16").to_s => 1, "foo".to_s => 2})
=> "{\"foo\":1,\"foo\":2}"

Covering that would have to be quite difficult. I think it's best we leave the responsibility to the user to provide reasonable input.

(As I wrote in the Rails issue) I'd actually prefer { 'a' => 1, :a => 2 } be turned into the questionably valid ("SHOULD" is not quite as strong as "MUST") JSON as "{\"a\":1,\"a\":2}" rather than either "{\"a\":1}" or "{\"a\":2}" as it would allow me to see that there is an issue with my program rather than hiding the error and making an arbitrary choice.

@byroot

byroot commented Oct 17, 2024

Copy link
Copy Markdown
Member

Thanks for the PR but I agree with John here. This is too costly to protect against this.

@byroot

byroot commented Aug 24, 2025

Copy link
Copy Markdown
Member

For the record I found a way to detect duplicate keys in a cheap way: #841

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSON.dump inconsistent output for hash

5 participants