Specification

Prototype 1

This document represents the specification of requirements and implementation details for Prototype 1.

General

Features:

One daemon controls multiple tables of key-values. The table is specified by the first path element in a request (table)
Valid HTTP/1.1 response including Keepalive and Content-Length
Accept multiple requests per connection (keepalive)
Non-blocking socket IO, Single threaded
Stream values small blocks) and large (single files) to the socket using sendfile or similar
LRU support (global size limit)
Murmurhash3 hashing algorithm, with exact key check for hash collisions

Requests supported:

GET /table/key HTTP/1.1 Get the value of key in table

PUT /table/key HTTP/1.1 Put (Set or inset) a value (post data) into key in table

DELETE /table/key HTTP/1.1 Delete the key in table

DELETE /table HTTP/1.1 Delete table, effectively purging table

GET /table HTTP/1.1 Get a listing of keys in the table

Request headers supported:

GET (key):

None

GET (list):

X-Start: Can we return a cursor id, and use this? Could keep it O(n)
X-Limit: Maximum keys to return

PUT:

X-Ttl: Time in seconds this entry is valid for
Content-Length (required): Size of POST data

DELETE:

None

Database

Each table will be contained in its own folder, within the db root.

Block File

-----------------------------------------
|     Block 1    |     Block 2    |
|  (4096 bytes)  |  (4096 bytes)  |  ...
-----------------------------------------

Accessing a block is as simple as taking the block identifier (a 0 based number) and multiplying by the block length. Possibly include a header structure for file version and block size?

The block file will be expanded as needed. Up to a maximum of of uint_32 (4,294,967,295) blocks.

Future: Defrag / Optimize routine?

Filesystem

|
|- blockfile.db
|-aa/
|-aa/badfookpsokdfs.key
|-aa/asdsadasdasdas.key
|-ab/
| ...

The block data is contained within blockfile.db. Additional large keys are contained as individual filesystem items. A two identifier value is extracted from the key and used to partition key data between multiple folders. This prevents filesystem issues with large numbers of files in a single directory.

Locking

No threading simplifies locking, however there can be many requests in progress at any time. The non blocking event based processing model processes an event until buffers are full, and then moves onto the next one.

Reading while Writing

While writing to a key, a write flag is used. It is not possible to begin to read a cache entry while it being written. Attempts to read while writing will return a cache miss.

Writing while Reading

Reading clients can be identified by checking the refs count.

Block file

Skip if refs != 0. This should be rare, generally a block can always fit into the networking buffers. Return 404 File not found.

Filesystem

Possibly move the file, mark for deletion and delete when the final client finishes? Large files have a higher cost and are more important to cache.

Writing while Writing

Request will silently succeed, all data will be read and then discarded. The response body (Status of 200 OK) will make the client aware of this action being taken.

* while Deleting

Not possible, unlink is blocking

Deleting while Reading

Flag for when done, delete?

Deleting while Writing

Flag for when done, delete?

LRU

A Least Recently Used list will be maintained globally. This will be done via a linked list between cache elements, and a pointer maintained to both the least recently used (head) and most recently used (tail).

Hard limits will be enforced on database size (sum of value size). When this limit is exceeded the least recently used values will be expired until the database it atleast X% under the limit.

Expiration

For now expiration will be lazy, performed on the GET request as validation. In the future an expiration cursor will be implemented.

Configuration

database-max-size: maximum size of database on disk
database-file-path: directory to store the database
database-lru-clear: percent of database to clear to make room during a LRU cleanup
bind-port: port to bind to
bind-addr: ip address to bind to

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specification

Prototype 1

General

Database

Block File

Filesystem

Locking

Reading while Writing

Writing while Reading

Block file

Filesystem

Writing while Writing

* while Deleting

Deleting while Reading

Deleting while Writing

LRU

Expiration

Configuration

Uh oh!

Clone this wiki locally