Skip to content

Error to read parquet with latest parquet-go #61

Description

@tanyaofei

  1. Create a file with python pandas
dataframe = pandas.DataFrame({
        "A": ["a", "b", "c", "d"],
        "B": [2, 3, 4, 1],
        "C": [10, 20, None, None]
    })

dataframe.to_parquet("1.parquet")

This file looks like:
image

  1. Read this file
func main() {
    ctx := context.Background()
    fr, _ := local.NewLocalFileReader("1.parquet")
    df, err := imports.LoadFromParquet(ctx, fr)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)
}
  1. Got a unique name error
panic: names of series must be unique: 

goroutine 1 [running]:
github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
        .../rocketlaunchr/dataframe-go@v0.0.0-20211025052708-a1030444159b/dataframe.go:41 +0x33c
github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
        .../go/pkg/mod/github.com/rocketlaunchr/dataframe-go@v0.0.0-20211025052708-a1030444159b/imports/parquet.go:110 +0x8ae
main.main()
        .../main.go:13 +0x78
  1. Following the stack, I found some useful informations
  • All series in method imports.LoadFromParquet with empty names

image

  • goFieldNameToActual
    each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map

image

image

This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions