Skip to content

FiV0/hooray2

Repository files navigation

Hooray2

Experiments with Datalog.

The goal of this repository is to test some ideas around Datalog.

The two main concepts explored in the repository are Worst-case optimal join (WCOJ) and DBSP, a formal framwork for incremental computation. We also try to combine the two.

Just show me the code already

Standard queries

(require '[hooray.core :as h])

(def node (h/connect {:type :mem :storage :hash :algo :generic}))

(h/transact node [{:db/id :ada :name "Ada" :last-name "Lovelace"}
                  {:db/id :petr :name "Alan" :last-name "Turing"}])

(h/q '{:find [name]
       :where [[?e :name name]
               [?e :last-name "Lovelace"]]}
     (h/db node))
;; => [["Ada"]]

Incremental queries

(with-open [node (h/connect {:type :mem :storage :hash :algo :generic})]

  (h/transact node [{:db/id :adam :name "Adam"}])

  (let [inc-q (h/q-inc node
                       '{:find [name]
                         :where [[?e :name name]]})]

    (try
      (h/transact node [{:db/id :ada :name "Ada"}
                        {:db/id :alan :name "Alan"}
                        [:db/retractEntity :adam]])

      (h/consume-delta! inc-q)

      (finally
        (h/unregister-inc-q node inc-q))))

        )
;; => ([["Ada"] 1] [["Alan"] 1] [["Adam"] -1])

I am still exploring what the incremental API should look like Clojure-wise. Some options below

(def node (h/connect {:type :mem :storage :hash :algo :generic}))
(h/transact node [{:db/id -100
                   :db/ident :name
                   :db/valueType :db.type/string
                   :db/cardinality :db.cardinality/one}])

(def names-query
  '{:find [name]
    :where [[e :name name]]})

;; Pull deltas explicitly.
(with-open [deltas (h/open-deltas node names-query)]
  (h/transact node [{:db/id :ivan :name "Ivan"}])
  (h/take! deltas))
;; => ([["Ivan"] 1])

;; Read deltas from a core.async channel.
(defn- take-with-timeout [ch]
  (let [timeout (async/timeout 1000)
        [value port] (async/alts!! [ch timeout])]
    (if (= port timeout)
      ::timeout
      value)))

(let [delta-ch (h/delta-chan node names-query)]
  (try
    (h/transact node [{:db/id :petr :name "Petr"}])
    (take-with-timeout delta-ch)
    (finally
      (async/close! delta-ch))))
;; => ([["Petr"] 1])

;; Receive deltas in a callback.
(with-out-str
  (with-open [_subscription (h/subscribe node names-query println)]
    (h/transact node [{:db/id :anna :name "Anna"}])))
;; => "([[Anna] 1])\n"

There are currently two incremental join algorithms which I am exploring, :wcoj and :standard. :standard is a the more traditional approach to incremental queries as described in the DBSP paper. Binary joins and reindexing of join keys whenever two relations have a different subset of join keys they overlap in. The :wcoj options explores using a WCOJ per delta join term in an DBSP multi-join expansion (that is a bit of a mouthful, I know). These can be switched with the dynamic var *dbsp-version*.

(binding [h/*dbsp-version* :wcoj]
    (let [inc-q (h/q-inc node
                         '{:find [name]
                           :where [[?e :name name]]})]

      ...
      ))

Join Algorithms

Currently the repository implements 2 join algorithms:

Some others that might be interesting to add for comparison.

  • Hash Join (just standard binary tree join strategy)
  • Yannakakis algorithm (the standard algorithm for non-cyclic queries)
  • Something that combines standard binary joins with WCOJ (see Free Join)
  • TreeTracker Join (very recent)

I have written a couple of blog posts about the GenericJoin algo in this repository:

DBSP

DBSP is a formal framework for incremental view maintenance for query languages (see https://arxiv.org/abs/2203.16684) The folks at Feldera are implementing it for SQL. The idea here is to do it for a datalog engine with an datomic-like API. I also couldn't find any mention of combining WCOJ with DBSP. So a first goal is to make Generic-Join work with DBSP and do so in a WCO way for the incoming deltas (modulo bugs and inefficiencies in the ZSet implementations).

I am currently using two different paths for standard queries and incremental queries. In Theory™ this is not necessary as you can model standard queries also as zsets (the delta is your database state), but this will likely have significant overhead and is making initial exploratory programming a lot harder.

I have written two blog posts about combining WCOJ with DBSP:

Roadmap

  • Standard queries
    • Standard triple pattern
    • Or
    • And
    • Not
    • Or-join
    • Not-join
    • predicates
    • functions
    • aggregates
    • pull
  • Incremental
    • Standard triple pattern
    • Or
    • And
    • Not
    • Or-join
    • Not-join
    • predicates
    • functions
    • aggregates
    • pull

Out of scope for now

  • rules - rules are very powerful, but they also require you to deal with recursive query evaluation and fixed point calculations. So for a first pass, let's forget about them.

Other projects in this space

DBSP / incremental computation

  • zsxf - WIP Datomic API for DBSP, uses transducers for everything
  • clj-3df - Datalog compiled to Differential Dataflow
  • relic - Incremental relational programming based on the Out of the tar pit paper
  • wizard - IVM for Datascript - hooks into datascript

Datalog engines

  • datascript - Immutable database and Datalog query engine for Clojure, ClojureScript and JS
  • xtdb v1 - General-purpose bitemporal database for Datalog & graph queries.
  • asami - Graph database written Clojure(script)

About

Experiments in Datalog.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from FiV0/clj-gradle