From 1eb91ad4a9e427d0ef3f699b3241e5fa93cb6fca Mon Sep 17 00:00:00 2001 From: Konstantin Baierer Date: Fri, 3 Jan 2020 18:00:52 +0100 Subject: [PATCH 1/2] define Standard Parameters for dpi, input-level, output-level, #134, OCR-D/core#376 --- ocrd_tool.md | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/ocrd_tool.md b/ocrd_tool.md index c3f8239..3b5d576 100644 --- a/ocrd_tool.md +++ b/ocrd_tool.md @@ -10,6 +10,84 @@ services](swagger). To validate a `ocrd-tool.json` file, use `ocrd ocrd-tool /path/to/ocrd-tool.json validate`. +## Standard parameters + +There is a number of parameters common to all processors that MUST be supported by processors. + +### `dpi` + +Custom DPI to assume for pixel density of images. + +MUST default to 300. + +### `input-level` + +On what level of typography should input images be processed? + +Processors MAY define a `default` value. + +`enum` MUST be a list of one or more of: + +* `page` +* `block` +* `line` +* `word` +* `glyph` + +### `output-level` + +On what level of typography should output images be produced? + +Processors MAY define a `default` value. + +`enum` MUST be a list of one or more of: + +* `page` +* `block` +* `line` +* `word` +* `glyph` + +Whether `input-level` and `output-level` match semantically is up to the +processor. I.e. if `input-level` and `output-level` are inconsistent according +to its semantics, processors MUST refuse further processing. + +### Sample for standard parameters + +Here is a snippet of an `ocrd-tool.json` for a tool that can operate on `page`, `block` or `line` level +and produce output on `block`, `line` or `glyph` level, e.g. [ocrd-cis-ocropy-segment](https://github.com/cisocrgroup/ocrd_cis/blob/dev/ocrd_cis/ocropy/segment.py): + +```hjson +{ + [...] + "parameter": { + "dpi": { + "type": "number", + "default": 300, + }, + "input-level": { + "type": "string": + "enum": ["page", "block", "line"], + "default": "page" + } + "output-level": { + "type": "array": + "item": { + "type": "string", + "enum": ["block", "line", "glyph"], + } + "default": "block" + } + } +} +``` + +Some sample parameters by the user and how they are passed to the processor: + +* `{}` --> `{"dpi": 300, "input-level": "page", "output-level": "block"}` +* `{"dpi": 72}` --> `{"dpi": 72, "input-level": "page", "output-level": "block"}` +* `{"input-level": "glyph"}` --> `{"dpi": 72, "input-level": "glyph", "output-level": "block"}` (This should in all likelihood be an error since it's highly unlikely that `output-level` is above the `input-level` but that is to be handled by processor) + ## File parameters To mark a parameter as expecting the address of a file, it must declare the From 2d82b4e87faac875de7c4a24711af186cead5cd1 Mon Sep 17 00:00:00 2001 From: Konstantin Baierer Date: Mon, 6 Jan 2020 18:17:22 +0100 Subject: [PATCH 2/2] drop input-level and propopse how processors should handle input-/output-level discrepancy --- ocrd_tool.md | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/ocrd_tool.md b/ocrd_tool.md index 3b5d576..043e800 100644 --- a/ocrd_tool.md +++ b/ocrd_tool.md @@ -20,20 +20,6 @@ Custom DPI to assume for pixel density of images. MUST default to 300. -### `input-level` - -On what level of typography should input images be processed? - -Processors MAY define a `default` value. - -`enum` MUST be a list of one or more of: - -* `page` -* `block` -* `line` -* `word` -* `glyph` - ### `output-level` On what level of typography should output images be produced? @@ -48,10 +34,15 @@ Processors MAY define a `default` value. * `word` * `glyph` -Whether `input-level` and `output-level` match semantically is up to the -processor. I.e. if `input-level` and `output-level` are inconsistent according +Whether the provided data and `output-level` match semantically is up to the +processor. I.e. if the input data and `output-level` are inconsistent according to its semantics, processors MUST refuse further processing. +For example, the user provides an `output-level` of `word`. For this, the +processor expects text lines in the input. If there are no text lines in the +input for whatever reason (it might be an empty page or it might not have been +processed down to line level yet), the processor MUST raise an exception. + ### Sample for standard parameters Here is a snippet of an `ocrd-tool.json` for a tool that can operate on `page`, `block` or `line` level