Skip to content

gbv/validation-api-ws

Repository files navigation

Validation API (demo)

Docker image Test

Demo of a simple Web API to validate data against predefined criteria

This web service implements a Data Validation API being specified as part of project AQinDA. The API helps allows to check data against application profiles and to integrate such checks into data processing workflows. The API is not meant to define quality criteria of application profiles but to execute defined qualitiy criteria in form of schema validation or other constraints.

Dependending on configuration data can be passed via HTTP GET and POST, via URL, or from local files at the server. The result of analysis is returned as list of errors in Data Validation Error Format or as detailled report in data quality report format (not implemented yet).

Table of Contents

Installation

The web application is started on http://localhost:7007 by default.

From sources

Requires basic development toolchain (sudo apt install build-essential) and Python 3 with module venv to be installed.

  1. clone repository: git clone https://github.com/gbv/validation-api-ws.git && cd validation-api-ws
  2. run make deps to install dependencies
  3. optionally Configure the instance
  4. make start

Via Docker

A Docker image is automatically build and published on GitHub. To run a one-shot instance of the application from the most recent Docker image:

docker run --rm -p 7007:7007 ghcr.io/gbv/validation-api-ws:main

A configuration directory or file must exist and be mounted:

test -f data/config.json && docker run --rm -p 7007:7007 --volume config:/app/config ghcr.io/gbv/validation-api-ws:main
test -f config.json && docker run --rm -p 7007:7007 --volume ./config.json:/app/config.json ghcr.io/gbv/validation-api-ws:main

Configuration

The default configuration contains some base formats. To defined application profiles to be checked against, create a configuration file in JSON format at config.json in the current directory or in the local subdirectory config. It is also possible to pass the location of config file or directory with argument --config at startup. The configuration file must contain field profiles with a list of profile objects and it can contain additional service settings.

The default configuration contains two profiles based on built-in checks whether the input data can be parsed as JSON or XML, respectively:

{
  "port": 7007,
  "files": false,
  "reports": false,
  "downloads": false,
  "profiles": [
    {
      "id": "json",
      "url": "https://json.org/",
      "description": "Check data to be parseable JSON",
      "checks": ["json"]
    },
    {
      "id": "xml",
      "description": "Check data to be well-formed XML",
      "checks": ["xml"]
    }
  ]
}

Service settings

  • title (title of the service) is set to "Validation Service" by default.
  • port (numeric port to run the service) is set to 7007 by default.
  • files (stage directory for data files at the server) is set to false (disabled) by default.
  • reports (reports directory to store reports in) is set to false (disabled) by default.
  • downloads (cache directory for data retrieved via URL) is set to false (disabled) by default.

Profiles

Each application profile is configured with a JSON object having a unique id, a list of checks, and additional metadata. See profiles configuration JSON Schema for details of the configuration.

Checks

Each check is either a string, referencing a base format or another profile, or a JSON object for a more complex check. By now only schema checks (against JSON Schema or XML Schema) have been implemented. Additional types of checks are planned.

Base formats

  • json - validate JSON syntax
  • xml - validate XML syntax (document must be well-formed XML)

Schema checks

Schema checks validates against a schema in some known schema language. The check is configured with two fields:

  • schema - the schema language
  • location - schema file or URL

The following schema languages are supported:

Script check

Script checks execute a script on the server (not implemented yet).

API call check

Pass data to another web service to be checked (not implemented yet).

Constraint check

Check data against complex constraints specified in AQinDa Constraint Language (yet to be defined)

API

Details of Data Validation API are still being specified, so details may change. The core response format is being specified as Data Validation Error Format. This implementation provides one endpoint for each profile, accesible via both GET and POST requests. The additional endpoint to list application profiles is not part of the core Data Validation API: other implementation might provide only one endpoint to validate againsta single application profile.

In addition there are optional endpoints to look up and to remove validation reports.

GET /{profile}/validate

Validate data against an application profile and return a list of errors in Data Validation Error Format. Data must be passed via one of these query parameters:

  • data as string
  • url to be downloaded from an URL (if the service is configured with downloads directory)
  • file to be read from a local file in the stage directory of the server (if the service is configured with files directory)

Status code is always 200 if validation could be executed, no matter whether errors have been found or not. For example validating the string [1,2 at default profile json results in the following validation response. The error position (after the fourth character on line 1) is referenced with multiple dimensions. Dimension values are always strings.

curl http://localhost:7007/json/validate -d '[1,2'
[
  {
    "message": "Expecting ',' delimiter",
    "position": {
      "line": "1",
      "linecol": "1:5",
      "offset": "4"
    }
  }
]

POST /{profile}/validate

The validation endpoint can also be queried via HTTP POST: data can be passed as request body or as file upload (content type multipart/form-data). Additional query parameters are not supported.

GET /profiles

Return a list of application profiles configured at this instance of the validation service. The information is a subset of profiles configuration limited to the public fields id (required), title, description, url, and report. Internal information about checks is not included.

GET /reports/{id}

Return a validation report. This endpoint has not been specified nor implemented yet.

DELETE /reports/{id}

Delete a validation report. This endpoint has not been specified nor implemented yet.

Contributing

  • make deps installs Python dependencies in a virtual environment in directory .venv. You may also want to call . .venv/bin/activate to active the environment.
  • make test runs unit tests
  • make all runs unit tests and integration test. Also puts coverage report into directory htmlcov
  • make lint checks coding style
  • make fix cleans up some coding style violations

To locally build and run the image Docker for testing:

docker image build -t validator .
docker run --rm -p 7007:7007 validator  # default config, or:
test -f config.json && docker run --rm -p 7007:7007 --volume ./config.json:/app/config.json validator

See also https://github.com/gbv/validation-server for a previous implementation in NodeJS. Both implementations may converge

Maintainers

License

MIT © 2025- Verbundzentrale des GBV (VZG)

This work has been funded by DFG in project AQinDa

About

Demo of a simple Web API to validate data against predefined criteria

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors