-
Notifications
You must be signed in to change notification settings - Fork 0
Specification
This document represents the specification of requirements and implementation details for Prototype 1.
Features:
- One daemon controls multiple tables of key-values. The table is specified by the first path element in a request (table)
- Valid HTTP/1.1 response including Keepalive and Content-Length
- Accept multiple requests per connection (keepalive)
- Non-blocking socket IO, Single threaded
- Stream values small blocks) and large (single files) to the socket using sendfile or similar
- LRU support (global size limit)
- Murmurhash3 hashing algorithm, with exact key check for hash collisions
Requests supported:
GET /table/key HTTP/1.1 Get the value of key in table
PUT /table/key HTTP/1.1 Put (Set or inset) a value (post data) into key in table
DELETE /table/key HTTP/1.1 Delete the key in table
DELETE /table HTTP/1.1 Delete table, effectively purging table
GET /table HTTP/1.1 Get a listing of keys in the table
Request headers supported:
GET (key):
- None
GET (list):
- X-Start: Can we return a cursor id, and use this? Could keep it O(n)
- X-Limit: Maximum keys to return
PUT:
- X-Ttl: Time in seconds this entry is valid for
- Content-Length (required): Size of POST data
DELETE:
- None
Each table will be contained in its own folder, within the db root.
-----------------------------------------
| Block 1 | Block 2 |
| (4096 bytes) | (4096 bytes) | ...
-----------------------------------------
Accessing a block is as simple as taking the block identifier (a 0 based number) and multiplying by the block length. Possibly include a header structure for file version and block size?
The block file will be expanded as needed. Up to a maximum of of uint_32 (4,294,967,295) blocks.
Future: Defrag / Optimize routine?
|
|- blockfile.db
|-aa/
|-aa/badfookpsokdfs.key
|-aa/asdsadasdasdas.key
|-ab/
| ...
The block data is contained within blockfile.db. Additional large keys are contained as individual filesystem items. A two identifier value is extracted from the key and used to partition key data between multiple folders. This prevents filesystem issues with large numbers of files in a single directory.
No threading simplifies locking, however there can be many requests in progress at any time. The non blocking event based processing model processes an event until buffers are full, and then moves onto the next one.
While writing to a key, a write flag is used. It is not possible to begin to read a cache entry while it being written. Attempts to read while writing will return a cache miss.
Reading clients can be identified by checking the refs count.
Skip if refs != 0. This should be rare, generally a block can always fit into the networking buffers. Return 404 File not found.
Possibly move the file, mark for deletion and delete when the final client finishes? Large files have a higher cost and are more important to cache.
Request will silently succeed, all data will be read and then discarded. The response body (Status of 200 OK) will make the client aware of this action being taken.
Not possible, unlink is blocking
Flag for when done, delete?
Flag for when done, delete?
A Least Recently Used list will be maintained globally. This will be done via a linked list between cache elements, and a pointer maintained to both the least recently used (head) and most recently used (tail).
Hard limits will be enforced on database size (sum of value size). When this limit is exceeded the least recently used values will be expired until the database it atleast X% under the limit.
For now expiration will be lazy, performed on the GET request as validation. In the future an expiration cursor will be implemented.
database-max-size: maximum size of database on disk
database-file-path: directory to store the database
database-lru-clear: percent of database to clear to make room during a LRU cleanup
bind-port: port to bind to
bind-addr: ip address to bind to