Skip to content

WIP: General improvements #4

@jamesaoverton

Description

@jamesaoverton

We have a good first implementation in ontodev/valve.py. Now that I've had a chance to use it, I'm considering some revisions and clarifications. As always, I want the user to be able to form a simple mental model of how the VALVE works, making it easy to learn and use, and avoid edge-cases and surprises.

  • add regular expression matches to the grammar /foo/ and interpret this as a match function
  • add regular expression substitutions to the grammar s/foo/bar/
  • ideally enforce that these are implemented as PCREs
  • generalize the datatype table to be reusable conditions in a hierarchy -- maybe rename to "condition" table
  • generalize datatypes from a tree to a DAG by allowing multiple parents
  • maybe enforce that datatype names are single words
  • maybe rework split(pattern, count, expression, ...) as concat(slot, slot, slot), e.g. concat(cell.label, " & ", gates)
  • drop CURIE and replace with more general concat(prefix.prefix, ":", local_name)
  • a tree with a split is not a tree, it's a directed acyclic graph -- I'd like to distinguish tree from dag (or maybe hierarchy)
  • I'm worried that the current grammar has a lot of ambiguity: double quoted strings vs double quoted datatypes or column names or table names -- maybe this doesn't matter

A condition defines a list of checks. Each check defines a predicate (function) that takes a string and returns a boolean, as well as a bunch of information about the check: name, parents, level, message, etc. For each cell, we go through the list of checks in order, and ensure that the cell satisfies the predicate.

A predicate can also be thought of as a set of strings for which the predicate is true. A set of strings can be defined extensionally or intensionally. For an extensionally defined set we have a list of all the strings, so we just look up the string in the set -- this is how in and under work. For an intensionally defined set we have a rule for determining if the string is in the set -- this is how regex matches and list work. Even distinct can be thought of as: this cell is not in the set of other cells in this column.

tree and lookup are a bit different. lookup takes a pair of strings to a boolean. tree does validate a cell but also defines a structure that under can use.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions