Skip to content

Unable to parse PDFs, "Failed to resolve 'test.elicit.org'" #327

Description

@mathcass

When trying to run the "Loading paper text" chapter from the Primer, I run into an error indicating that it can't find "test.elicit.org". Since paper.parse_pdf depends on this remote resource to parse the PDF, it can't proceed at all.

Here's a full trace of what I see:

Full trace
python recipes/paper_hello.py --paper papers/keenan-2018.pdf
/home/cass/src/ice/venv/lib/python3.11/site-packages/pydantic/_migration.py:283: UserWarning: `pydantic.generics:GenericModel` has been moved to `pydantic.BaseModel`.
  warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')
/home/cass/src/ice/venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:
* 'keep_untouched' has been renamed to 'ignored_types'
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/socket.py", line 961, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/adapters.py", line 589, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='test.elicit.org', port=443): Max retries exceeded with url: /elicit-previews/james/oug-3083-support-parsing-arbitrary-pdfs-using/parse_pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cass/src/ice/recipes/paper_hello.py", line 10, in <module>
    recipe.main(answer_for_paper)
  File "/home/cass/src/ice/ice/recipe.py", line 176, in main
    defopt.run(
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 348, in run
    call = bind(
           ^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 255, in bind
    call, rest = _bind_or_bind_known(*args, _known=False, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 203, in _bind_or_bind_known
    args, rest = parser.parse_args(argv), []
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1862, in parse_args
    args, argv = self.parse_known_args(args, namespace)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1895, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2103, in _parse_known_args
    start_index = consume_optional(start_index)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2043, in consume_optional
    take_action(action, args, option_string)
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1955, in take_action
    argument_values = self._get_values(action, argument_strings)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2485, in _get_values
    value = self._get_value(action, arg_string)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2518, in _get_value
    result = type_func(arg_string)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/ice/recipe.py", line 181, in <lambda>
    Paper: lambda path: Paper.load(Path(path)),
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/ice/paper.py", line 158, in load
    paragraph_dicts = parse_pdf(file)
                      ^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/ice/cache.py", line 28, in sync_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/ice/paper.py", line 119, in parse_pdf
    r = requests.post(PDF_PARSER_URL, files=files)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/adapters.py", line 622, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='test.elicit.org', port=443): Max retries exceeded with url: /elicit-previews/james/oug-3083-support-parsing-arbitrary-pdfs-using/parse_pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)"))

❓ Is there an alternative that folks recommend for PDF parsing here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions