Experiments with Datalog.
The goal of this repository is to test some ideas around Datalog.
The two main concepts explored in the repository are Worst-case optimal join (WCOJ) and DBSP, a formal framwork for incremental computation. We also try to combine the two.
(require '[hooray.core :as h])
(def node (h/connect {:type :mem :storage :hash :algo :generic}))
(h/transact node [{:db/id :ada :name "Ada" :last-name "Lovelace"}
{:db/id :petr :name "Alan" :last-name "Turing"}])
(h/q '{:find [name]
:where [[?e :name name]
[?e :last-name "Lovelace"]]}
(h/db node))
;; => [["Ada"]](with-open [node (h/connect {:type :mem :storage :hash :algo :generic})]
(h/transact node [{:db/id :adam :name "Adam"}])
(let [inc-q (h/q-inc node
'{:find [name]
:where [[?e :name name]]})]
(try
(h/transact node [{:db/id :ada :name "Ada"}
{:db/id :alan :name "Alan"}
[:db/retractEntity :adam]])
(h/consume-delta! inc-q)
(finally
(h/unregister-inc-q node inc-q))))
)
;; => ([["Ada"] 1] [["Alan"] 1] [["Adam"] -1])I am still exploring what the incremental API should look like Clojure-wise. Some options below
(def node (h/connect {:type :mem :storage :hash :algo :generic}))
(h/transact node [{:db/id -100
:db/ident :name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}])
(def names-query
'{:find [name]
:where [[e :name name]]})
;; Pull deltas explicitly.
(with-open [deltas (h/open-deltas node names-query)]
(h/transact node [{:db/id :ivan :name "Ivan"}])
(h/take! deltas))
;; => ([["Ivan"] 1])
;; Read deltas from a core.async channel.
(defn- take-with-timeout [ch]
(let [timeout (async/timeout 1000)
[value port] (async/alts!! [ch timeout])]
(if (= port timeout)
::timeout
value)))
(let [delta-ch (h/delta-chan node names-query)]
(try
(h/transact node [{:db/id :petr :name "Petr"}])
(take-with-timeout delta-ch)
(finally
(async/close! delta-ch))))
;; => ([["Petr"] 1])
;; Receive deltas in a callback.
(with-out-str
(with-open [_subscription (h/subscribe node names-query println)]
(h/transact node [{:db/id :anna :name "Anna"}])))
;; => "([[Anna] 1])\n"There are currently two incremental join algorithms which I am exploring,
:wcoj and :standard. :standard is a the more traditional approach to incremental queries
as described in the DBSP paper. Binary joins and reindexing of join keys whenever two relations
have a different subset of join keys they overlap in. The :wcoj options explores using
a WCOJ per delta join term in an DBSP multi-join expansion (that is a bit of a mouthful, I know).
These can be switched with the dynamic var *dbsp-version*.
(binding [h/*dbsp-version* :wcoj]
(let [inc-q (h/q-inc node
'{:find [name]
:where [[?e :name name]]})]
...
))Currently the repository implements 2 join algorithms:
- Leapfrog Triejoin ( https://arxiv.org/pdf/1210.0481.pdf )
- Generic Join (a variation of Generic-Join https://arxiv.org/abs/1310.3314 )
Some others that might be interesting to add for comparison.
- Hash Join (just standard binary tree join strategy)
- Yannakakis algorithm (the standard algorithm for non-cyclic queries)
- Something that combines standard binary joins with WCOJ (see Free Join)
- TreeTracker Join (very recent)
I have written a couple of blog posts about the GenericJoin algo in this repository:
DBSP is a formal framework for incremental view maintenance for query languages (see https://arxiv.org/abs/2203.16684) The folks at Feldera are implementing it for SQL. The idea here is to do it for a datalog engine with an datomic-like API. I also couldn't find any mention of combining WCOJ with DBSP. So a first goal is to make Generic-Join work with DBSP and do so in a WCO way for the incoming deltas (modulo bugs and inefficiencies in the ZSet implementations).
I am currently using two different paths for standard queries and incremental queries. In Theory™ this is not necessary as you can model standard queries also as zsets (the delta is your database state), but this will likely have significant overhead and is making initial exploratory programming a lot harder.
I have written two blog posts about combining WCOJ with DBSP:
- Standard queries
- Standard triple pattern
- Or
- And
- Not
- Or-join
- Not-join
- predicates
- functions
- aggregates
- pull
- Incremental
- Standard triple pattern
- Or
- And
- Not
- Or-join
- Not-join
- predicates
- functions
- aggregates
- pull
Out of scope for now
- rules - rules are very powerful, but they also require you to deal with recursive query evaluation and fixed point calculations. So for a first pass, let's forget about them.
- zsxf - WIP Datomic API for DBSP, uses transducers for everything
- clj-3df - Datalog compiled to Differential Dataflow
- relic - Incremental relational programming based on the Out of the tar pit paper
- wizard - IVM for Datascript - hooks into datascript
- datascript - Immutable database and Datalog query engine for Clojure, ClojureScript and JS
- xtdb v1 - General-purpose bitemporal database for Datalog & graph queries.
- asami - Graph database written Clojure(script)