database-internals-hashfile

A disk-resident extendible-hash file implemented in C on top of a custom buffer manager (the BF library distributed with the course). Records live in fixed-size blocks paged in and out by the BF layer; the index above them grows dynamically by splitting buckets and doubling the directory when needed.

First homework of the Database Implementation course at the University of Athens (Department of Informatics & Telecommunications), 2021–2022. Three-person team project (see Team below).

What it does

The course-provided API (hash_file.h) lets a client create, open, and close hash-file indices, insert Record { id, name, surname, city } entries, look records up by key, and dump statistics. We implemented the layer that turns that API into actual on-disk extendible hashing.

                      ┌───────────────────────────────────┐
                      │       HT_Handler (in memory)      │
                      │  ┌─────────────────────────────┐  │
                      │  │ file_header { global depth, │  │
                      │  │   2^depth dir entries,      │  │
                      │  │   records-per-block }       │  │
                      │  └─────────────────────────────┘  │
                      │  arr[ hash(key) ] = block_id      │
                      └────────────────┬──────────────────┘
                                       │
                                       ▼  via BF_GetBlock / BF_Block_GetData
                ┌──────────────────────┴──────────────────────────┐
                │                                                 │
        ┌───────┴────────┐  ┌─────────────────┐  ┌────────────────┴───┐
        │   bucket b₀    │  │   bucket b₁     │  │   bucket b₂        │
        │  records[...]  │  │  records[...]   │  │  records[...]      │
        │  block_info {  │  │  block_info {   │  │  block_info {      │
        │   local_depth, │  │   local_depth,  │  │   local_depth,     │
        │   friends,     │  │   friends,      │  │   friends,         │
        │   lowest_index,│  │   lowest_index, │  │   lowest_index,    │
        │   next_block_id│  │   next_block_id │  │   next_block_id    │
        │  }             │  │  }              │  │  }                 │
        └────────────────┘  └─────────────────┘  └────────────────────┘
                ▲
                │  multiple directory slots can point to the same bucket
                │  while local_depth < global_depth (the "friends" count
                │  tracks how many)

When a bucket fills:

If local_depth < global_depth, split locally — allocate a new block, redistribute records by the next bit of the hash, and update the directory entries that point to the split bucket.
If local_depth == global_depth, double the directory (global_depth++) and then split.

Overflow lists exist via next_block_id for the pathological case where a single hash key maps to too many records to fit in one block — but as the team readme.txt honestly notes, repeated identical IDs cause infinite recursive doubling. That is a property of the algorithm we were asked to implement, not a bug in our implementation.

What was interesting

Page-aware data layout. Block metadata (block_info) is packed at the end of each page (data + BF_BLOCK_SIZE - sizeof(block_info)), leaving the front of the block for Record payload. Same trick for the file header in block 0.
An iterator abstraction. Rather than open-coding BF_GetBlock / BF_UnpinBlock pairs every time we walked records, we wrote an iterator that hides block boundaries — caller asks for "next record", the iterator handles page transitions and pin/unpin. Big readability win across HT_InsertEntry, HT_PrintAllEntries, and the split logic.
friends and lowest_index per bucket. Each bucket knows how many directory slots currently point to it and which is the lowest. That makes both the split logic and the directory rewrites O(friends) instead of needing to scan the whole directory.

Team

Role	Contributor
Iterator + insert path	Alexandros Gkiafis — `sdi2200284`
File create / open / close	Charalampos Rafail Dimitriou — `sdi1900354`
`PrintAllEntries` + statistics	Christos Dokoutsidis — `sdi2000269`
General-purpose helpers, design, debugging	Shared

The split of "who owns which function" is a formality; design, debugging, comments, and tests were done together as a team.

Build & run

The actual project sits in code/. From there:

make ht        # builds the hash-file driver (build/runner)
./build/runner # runs examples/ht_main.c against a fresh data.db

examples/ht_main.c inserts 1000 randomly-generated records into a freshly-created data.db, prints all entries matching a randomly-chosen id, then dumps statistics.

make bf builds a separate driver against the raw buffer-manager API, useful for sanity-checking the BF layer in isolation.

Files

Path	Purpose
`code/include/bf.h`	Buffer-manager interface (course-provided)
`code/include/hash_file.h`	Hash-file API the assignment asked us to implement
`code/src/hash_file.c`	Our implementation — directory, splits, iterator, insert, print, stats
`code/examples/ht_main.c`	Driver used to evaluate the implementation
`code/lib/libbf.so`	Pre-built buffer-manager library (course-provided)
`DB_askisi_1_2021-2022.pdf`	The original assignment specification (Greek)

License

MIT — applies to our own code in this repo (code/src/hash_file.c). The BF library, the hash_file.h API contract, and the assignment PDF were distributed by the course and retain their original copyright.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
.gitignore		.gitignore
DB_askisi_1_2021-2022.pdf		DB_askisi_1_2021-2022.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

database-internals-hashfile

What it does

What was interesting

Team

Build & run

Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

database-internals-hashfile

What it does

What was interesting

Team

Build & run

Files

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages