Skip to content

Scopus exceeds csv field limit #92

@r-wrobel

Description

@r-wrobel

Hi, I encountered a bug which is triggered by (very) long lines in the csv files.
It seems that the used csv module has a limit for the number of characters for fields of 131072 characters:

Cell In[33], line 2
----> 2 docs_scopus=litstudy.load_scopus_csv("scopus.csv")

File [\site-packages\litstudy\sources\scopus_csv.py:116] in load_scopus_csv(path)
    114 with robust_open(path) as f:
    115     lines = csv.DictReader(f)
--> 116     docs = [ScopusCsvDocument(line) for line in lines]
    117     return DocumentSet(docs)

File \Lib\csv.py:116, in DictReader.__next__(self)
    113 if self.line_num == 0:
    114     # Used only for its side effect.
    115     self.fieldnames
--> 116 row = next(self.reader)
    117 self.line_num = self.reader.line_num
    119 # unlike the basic reader, we prefer not to return blanks,
    120 # because we will typically wind up with a dict full of None
    121 # values

Error: field larger than field limit (131072)

You can use the DOI 10.1016/C2013-0-19213-6 for testing. The line of the complete csv export from Scopus has 182667 chars.
I assume, a solution is presented at https://stackoverflow.com/a/15063941

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions