Skip to content

Databased#10

Open
kousu wants to merge 43 commits into
bruzen:databasedfrom
kousu:databased
Open

Databased#10
kousu wants to merge 43 commits into
bruzen:databasedfrom
kousu:databased

Conversation

@kousu

@kousu kousu commented Sep 7, 2014

Copy link
Copy Markdown

RAII fixes for the databased branch:

Things now shutdown and remove themselves and clean up properly in every possible process death scenario I can think of: websocket.close or closed browser window, a ctrl-c, a kill or even (eventually) a kill -9 on the server all cause the replication to that client to silence itself and stop wasting CPU, and all clients leaving even removes the trigger so that postgres doesn't have to waste cycles thinking about not broadcasting changes.

Also, server.sh and client.sh and replicate.py all can be pointed at a chosen database on the command line, and there's an init.sh script to make installing easy, and you can use the standard PGDATABASE, PGHOST, PGPORT, PGUSER and PGDATA variables to control what database you're talking to.

This is a good place to merge since it represents a level of reliability that I feel makes this usable in production.

bruzen and others added 30 commits August 2, 2014 19:50
Now it is restartable without having to use pkill or dig through your system monitor.
i should have done this immediately.
forkit.py demonstrates that socat does indeed shut down EXEC processes
when the socket dies (tested with both TCP and Unix Domain). So something about
replicate.py is holding it open idenfinitely. In fact, replicate.py can outlive socat.

I could rewrite replicate.py as a socket app itself, and in it select() on both the input socket and the output socket, but that's a lot of work and it might not even be the proper bug.
Figuring out how to get around this took a long time,
and my solution is not as elegant as I would like.
It came down to miusing a socketpair() like a Unix signal or a
 because our spoolers block at select() already,
 but I wanted to have them stop blocking until input OR a "signal" comes,
 so I needed to use something that had a file descriptor behind it.
I also now spin off subsidary threads simply because I wanted to block on two totally separate file descriptors in parallel and I could either rewrite my select() loops to be one single loop--which breaks encapsulation and just feels messy--or do this, which also feels messy but at least maintains encapsulation.
…e replication hooks for you

So now, installing this system should look something like

Get the software
$ pkg_add postgres python-psycopg2 python-sqlalchemy
$ git clone modex
$ cd modex/

Set up the DB
$ cd src/backend/db
$ ./init.sh eutopia #or other model of your choice
$ ./server.sh

in a different session, set up the simulation
$ cd src/backend/models/eutopia
$ ./eutopia postgresql://localhost/eutopia  #but you can use "sqlite:///eutopia.sqlite" if you are having problems with postgres

in a third and fpurth and fifth session, set up the web clients
$ python -m SimpleHTTPServer
$ cd src/backend/db; ./replicate_server 8081 postgresql://localhost/eutopia activity_counts #<-- this part is still dodgy
$ browser http://localhost:8000/src/frontend/demo.html

AND IF THIS DOESN'T WORK AS PROMISED IT'S A BUG
Now that I've figured out JS inheritence, doing this was reasonable.
also a small nicety fix for init.sh
and make HDRJCacheSet (cum Table) inherit the same as the other classes.
…dcoding.

This now matches the stated API in commit ff01d36.
Also, make the Map tree fit the same inheritence pattern
Found a bug: instanceof fails on subclasses. see the links in that
stackoverflow post about js inheritence for ideas about combatting this.

Found a bug: .delete() is broken. :(
…thly.

This is better than manually tracking all children it spawns, even though there is only one, at the moment.
This made test.psql annoying to use, because client.sh would basically lie
about where it was running from, and test.psql relies on \COPY being able
to use relative paths.
1) write the replication as the main thread as originally, and
   spawn the stdin-watching process as the subthread
2) write a shutdown() function which makes what's going on really explicit
3) catch signals which additionally should cause cleanups (I missed this before)

Also, wrap the OS X workaround in an if and give feedback from it.
globals are bad usually, but in this one case, where we need it in
something else global--a signal handler--it makes sense.
replicate() originally existed in replicate.pysql, where I chose to use 2-space indents and had to name table "_table"
The trick was that alivethread() was holding the program (and hence
its file descriptors and hence socat and hence websockify) open after
the main thread had exited.

This is exactly that Thread.daemon solves.

I love easy fixes.
I already made it so that unregister() happens
 if the front end websocket closes / replicate.py quits (e.g. via ctrl-c)
 if replicate.py is kill -TERM or kill -INT or kill -QUIT'd
(i.e. soft kills)

this makes unregister() happen
 if replicate.py is kill -KILL'd or kill -ABRT'd
but since those are hard kills, this case is only detected
on the next write to the watched table.
kousu added 13 commits September 6, 2014 19:41
This commit is sort of wordy because SQL.
All it does is:
 table.refcount--;
 if table.refcount == 0:
  delete table.trigger
This should allow switching between a production and dev server to be smooth.

Just export PGHOST/PGDATA/PGPORT/PGUSER/PGDATABASE as appropriate.
pg_vars reads what you've set and fills in what you haven't to point to the local server.

init.sh only spawns a server if it doesn't find one already, so that you can init a preexisting db
I finally figured out what I was driving at with my 'pending queue' idea.
It required throwing out what I did before, since what I did before
didn't handle multisets properly anyhow.

The trick is to have the pending queue store *the deltas*
between the sources and the sink, and that we can keep them
in sync without rescanning all the sources.
This is more expensive than I was initially envisioning,
but I don't see any way around it, and anyway the deltas
 plus the cache should by definition add up to only one extra copy of
 all the sources, which is on the same order as what map() and where() already do.

I've started some tests, but it needs many, many more.
Also explain in a 2am rambling way why _.push(e) is 100% correct
I even followed the unfinalized JS6 iterator spec
 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function
This was tedious, but worth it. It will make Not() and Or() a lot easier.
This belongs not in modex, but I accomplished it here first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants