Skip to content

czhao-dev/systems-debugging-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LogForge

LogForge is a C++ multithreaded log analytics engine designed to demonstrate practical systems debugging, profiling, tracing, testing, and performance optimization workflows.

The project processes large synthetic log files and supports common analytics operations such as status-code aggregation, top-IP queries, top-path queries, latency statistics, error filtering, and optional query indexing.

All input data is synthetic and generated by scripts in this repository.


Project Goals

LogForge is built as a portfolio project to demonstrate hands-on experience with modern C++ systems development and professional diagnostic tools.

The project demonstrates:

  • Modern C++ systems programming
  • CMake-based build configuration
  • Multithreaded data processing
  • Interactive debugging with GDB and LLDB
  • Memory debugging with Valgrind Memcheck
  • Fast runtime bug detection with AddressSanitizer, LeakSanitizer, ThreadSanitizer, and UndefinedBehaviorSanitizer
  • System-call tracing with strace and dtruss
  • CPU profiling with perf
  • Function-level profiling with gprof
  • Experiment-based profiling with gprofng
  • Heap profiling with Valgrind Massif and heaptrack
  • Cache and call-graph profiling with Cachegrind and Callgrind
  • Static analysis with clang-tidy and cppcheck
  • Code formatting with clang-format
  • Test coverage with gcov, lcov, or llvm-cov
  • Benchmark automation and performance comparison

The focus is not only to build a working log analyzer, but also to document how real bugs and performance issues can be found, explained, fixed, and measured.


Example Use Cases

LogForge analyzes log files with records such as:

2026-06-19T10:15:21Z 192.168.1.10 GET /api/users 200 34ms
2026-06-19T10:15:22Z 192.168.1.11 POST /api/login 401 12ms
2026-06-19T10:15:23Z 192.168.1.12 GET /api/orders 500 93ms
2026-06-19T10:15:24Z 192.168.1.10 GET /api/products 200 18ms

Supported operations include:

./logforge --input logs/server.log --status-counts
./logforge --input logs/server.log --top-ips 10
./logforge --input logs/server.log --top-paths 10
./logforge --input logs/server.log --latency-stats
./logforge --input logs/server.log --errors-only
./logforge --input logs/server.log --threads 8 --status-counts
./logforge --input logs/server.log --build-index
./logforge --input logs/server.log --query "status=500"

Example output:

Status Counts
-------------
200: 824391
301: 12044
400: 2891
401: 3812
404: 5821
500: 1033

Latency Statistics
------------------
p50: 18 ms
p95: 92 ms
p99: 214 ms
max: 731 ms

Why This Project Exists

Many C++ projects show algorithms or data structures, but fewer demonstrate the debugging, tracing, and profiling workflows used in production systems work.

LogForge is designed to show the full engineering loop:

  1. Build a realistic C++ command-line tool.
  2. Introduce controlled memory, threading, I/O, correctness, and performance issues.
  3. Detect those issues using professional tools.
  4. Explain the root cause.
  5. Fix the implementation.
  6. Measure the improvement.
  7. Document the workflow clearly.

This makes the project useful for demonstrating systems-level debugging, profiling, and performance engineering skills.


Tool Demonstration Matrix

Category Tool Scenario Demonstrated Skill
Interactive debugging GDB Crash in parser, invalid object state, breakpoint inspection Source-level C++ debugging
Interactive debugging LLDB Same debugging workflow using LLVM tooling Cross-platform debugging
Memory debugging Valgrind Memcheck Memory leak, invalid read, uninitialized value Deep memory diagnostics
Runtime sanitizers AddressSanitizer Buffer overflow, use-after-free Fast memory-error detection
Runtime sanitizers LeakSanitizer Leaked allocations Leak detection in sanitizer builds
Runtime sanitizers ThreadSanitizer Race condition in shared aggregation map Thread-safety debugging
Runtime sanitizers UndefinedBehaviorSanitizer Integer overflow, invalid enum, bad shift Undefined behavior detection
System tracing strace / dtruss Excessive read() system calls System-call tracing and I/O analysis
CPU profiling perf Hot functions, CPU cycles, branch/cache behavior Linux performance profiling
Function profiling gprof Flat profile and call graph Function-level profiling
Function profiling gprofng Function and call-tree analysis Modern GNU profiling workflow
Heap profiling Massif Peak heap usage in index builder Memory footprint analysis
Heap profiling heaptrack Allocation hot spots Allocation profiling
Cache profiling Cachegrind Cache misses in parser and aggregation Cache behavior analysis
Call-graph profiling Callgrind Expensive call paths Call-path optimization
Static analysis clang-tidy Bug-prone patterns and modernization suggestions Static C++ analysis
Static analysis cppcheck Additional static checks Lightweight code analysis
Formatting clang-format Consistent style Automated formatting
Coverage gcov / lcov / llvm-cov Unit-test coverage Test quality measurement

Repository Structure

LogForge/
├── CMakeLists.txt
├── README.md
├── include/
│   ├── LogRecord.h
│   ├── LogParser.h
│   ├── LogIndex.h
│   ├── QueryEngine.h
│   ├── ThreadPool.h
│   ├── Aggregator.h
│   └── ReportWriter.h
├── src/
│   ├── main.cpp
│   ├── LogParser.cpp
│   ├── LogIndex.cpp
│   ├── QueryEngine.cpp
│   ├── ThreadPool.cpp
│   ├── Aggregator.cpp
│   └── ReportWriter.cpp
├── tests/
│   ├── test_parser.cpp
│   ├── test_aggregator.cpp
│   ├── test_query_engine.cpp
│   └── test_thread_pool.cpp
├── bugs/
│   ├── memory_leak.cpp
│   ├── buffer_overflow.cpp
│   ├── use_after_free.cpp
│   ├── data_race.cpp
│   ├── undefined_behavior.cpp
│   ├── parser_crash.cpp
│   └── syscall_storm.cpp
├── bench/
│   ├── generate_logs.py
│   ├── run_benchmarks.sh
│   └── compare_results.py
├── scripts/
│   ├── run_gdb.sh
│   ├── run_lldb.sh
│   ├── run_valgrind.sh
│   ├── run_asan.sh
│   ├── run_lsan.sh
│   ├── run_tsan.sh
│   ├── run_ubsan.sh
│   ├── run_strace.sh
│   ├── run_perf.sh
│   ├── run_gprof.sh
│   ├── run_gprofng.sh
│   ├── run_massif.sh
│   ├── run_heaptrack.sh
│   ├── run_cachegrind.sh
│   ├── run_callgrind.sh
│   ├── run_clang_tidy.sh
│   ├── run_cppcheck.sh
│   ├── run_format.sh
│   └── run_coverage.sh
├── docs/
│   ├── gdb.md
│   ├── lldb.md
│   ├── valgrind.md
│   ├── asan.md
│   ├── lsan.md
│   ├── tsan.md
│   ├── ubsan.md
│   ├── strace.md
│   ├── perf.md
│   ├── gprof.md
│   ├── gprofng.md
│   ├── massif.md
│   ├── heaptrack.md
│   ├── cachegrind.md
│   ├── callgrind.md
│   ├── static_analysis.md
│   ├── coverage.md
│   └── docker_dev_environment.md
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── results/
│   ├── before/
│   └── after/
└── logs/
    └── generated sample logs

Core Design

LogForge is organized into several main components.

Log Parser

The parser converts raw log lines into structured records.

struct LogRecord {
    std::string timestamp;
    std::string ip;
    std::string method;
    std::string path;
    int status;
    int latency_ms;
};

The project supports multiple parser implementations for comparison:

  • stringstream parser: simple but slower
  • Manual parser: faster tokenization
  • string_view parser: reduced temporary allocations
  • Buffered reader: reduced system-call overhead

Aggregation Engine

The aggregation engine computes statistics from parsed records.

Supported aggregations include:

  • HTTP status-code counts
  • Top client IP addresses
  • Top requested paths
  • Error-only filtering
  • Latency percentiles
  • Request-method counts

Example:

./logforge --input logs/1m.log --status-counts --top-paths 10

Multithreaded Processing

LogForge supports parallel processing of large input files.

Each worker thread processes a chunk of the input and produces local statistics. The local results are merged at the end.

This design avoids unnecessary contention and provides a useful comparison against a deliberately flawed shared-map implementation.

Bad design:

global_status_counts[record.status]++;

Better design:

local_stats[thread_id].status_counts[record.status]++;

Final merge:

for (const auto& local : local_stats) {
    merge(global_stats, local);
}

This provides a realistic demonstration of using ThreadSanitizer to detect a race and then redesigning the data flow for correctness and scalability.


Optional Query Index

LogForge can build a simple in-memory index for repeated queries.

Supported query examples:

./logforge --input logs/large.log --query "status=500"
./logforge --input logs/large.log --query "ip=192.168.1.10"
./logforge --input logs/large.log --query "path=/api/login"

The index maps selected fields to matching record IDs:

status -> record IDs
ip     -> record IDs
path   -> record IDs

This provides additional opportunities to analyze memory use, allocation behavior, and query performance.


Build Instructions

Requirements

Recommended environment:

  • Linux
  • CMake
  • C++17 or newer compiler
  • GCC or Clang
  • Python 3 for log generation scripts

Optional tools:

  • GDB
  • LLDB
  • Valgrind
  • perf
  • gprof
  • gprofng
  • strace
  • dtruss on macOS
  • heaptrack
  • clang-tidy
  • cppcheck
  • clang-format
  • gcov, lcov, or llvm-cov
  • AddressSanitizer-capable compiler
  • ThreadSanitizer-capable compiler
  • UndefinedBehaviorSanitizer-capable compiler

Valgrind, perf, gprof, gprofng, and heaptrack are Linux-only (or, for gprof/gprofng, not part of Apple's Clang toolchain) and have no macOS equivalent. A ready-to-use Docker environment with all of these installed is provided in docker/ — see docs/docker_dev_environment.md.


Standard Build

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Run:

./build/logforge --input logs/server.log --status-counts

Debug Build

cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug

Run:

./build-debug/logforge --input logs/server.log --status-counts

Generating Synthetic Logs

Generate a small test log:

python3 bench/generate_logs.py --records 10000 --output logs/10k.log

Generate a larger benchmark log:

python3 bench/generate_logs.py --records 1000000 --output logs/1m.log

Generate a stress-test log:

python3 bench/generate_logs.py --records 10000000 --output logs/10m.log

All generated logs are synthetic and contain no private or production data.


Debugging and Profiling Workflows

GDB

GDB is used for interactive source-level debugging on Linux.

Example debug build:

cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug

Start GDB:

gdb --args ./build-debug/logforge --input logs/10k.log --status-counts

Useful commands:

break main
break LogParser::parse_line
run
next
step
continue
print record
backtrace
info locals
info threads
thread apply all backtrace

Example debugging scenario:

Problem:
The parser crashes when processing a malformed log line.

Tool:
GDB

Root cause:
The parser assumes every line contains six fields and accesses a missing token.

Fix:
Added validation before constructing LogRecord.

Result:
Malformed lines are skipped and counted in the error report instead of crashing.

See:

docs/gdb.md

LLDB

LLDB is used as an alternative interactive debugger, especially useful with Clang/LLVM-based toolchains and macOS.

Start LLDB:

lldb -- ./build-debug/logforge --input logs/10k.log --status-counts

Useful commands:

breakpoint set --name main
breakpoint set --name LogParser::parse_line
run
next
step
continue
frame variable
thread backtrace all
expression record.status

Example debugging scenario:

Problem:
The query engine returns no records for status=500.

Tool:
LLDB

Root cause:
The query parser treats the value as a string but the index stores status codes as integers.

Fix:
Added typed query parsing for numeric fields.

Result:
status=500 correctly returns all matching records.

See:

docs/lldb.md

Valgrind Memcheck

Valgrind Memcheck is used to detect memory leaks, invalid reads/writes, and uninitialized values.

Example command:

valgrind --leak-check=full --track-origins=yes ./build-debug/logforge --input logs/10k.log --status-counts

Example documented issue:

Problem:
The index builder leaked LogRecord objects when malformed lines were skipped.

Root cause:
A raw pointer was allocated before validation and was not released on the error path.

Fix:
Replaced raw owning pointers with value-based storage or std::unique_ptr.

Result:
Valgrind reported no definitely lost memory after the fix.

See:

docs/valgrind.md

AddressSanitizer

AddressSanitizer is used for fast memory-error detection during development.

Build:

cmake -B build-asan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-asan

Run:

ASAN_OPTIONS=detect_leaks=1 ./build-asan/logforge --input logs/10k.log --status-counts

Example bug scenario:

char method[4];
std::strcpy(method, token.c_str());

This can overflow for "POST" because the null terminator also requires space.

Fixed version:

std::string method;

or:

std::array<char, 8> method{};

See:

docs/asan.md

LeakSanitizer

LeakSanitizer is used to detect leaked allocations in sanitizer builds.

Build:

cmake -B build-lsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-lsan

Run:

ASAN_OPTIONS=detect_leaks=1 ./build-lsan/logforge --input logs/10k.log --build-index

Example issue:

Problem:
A temporary query index allocated nodes but failed to release them after an exception.

Tool:
LeakSanitizer

Fix:
Replaced manual allocation with RAII containers.

Result:
LeakSanitizer reported no leaks.

See:

docs/lsan.md

ThreadSanitizer

ThreadSanitizer is used to detect data races in multithreaded processing.

Build:

cmake -B build-tsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_TSAN=ON
cmake --build build-tsan

Run:

./build-tsan/logforge --input logs/1m.log --threads 8 --status-counts

Example race:

std::unordered_map<int, int> status_counts;

void process_record(const LogRecord& record) {
    status_counts[record.status]++;
}

Fixed design:

std::vector<LocalStats> local_stats(num_threads);

Each thread updates its own local statistics, and the final result is merged after all worker threads complete.

See:

docs/tsan.md

UndefinedBehaviorSanitizer

UndefinedBehaviorSanitizer is used to detect undefined behavior such as signed integer overflow, invalid shifts, invalid enum values, and null reference usage.

Build:

cmake -B build-ubsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_UBSAN=ON
cmake --build build-ubsan

Run:

./build-ubsan/logforge --input logs/10k.log --latency-stats

Example issue:

Problem:
Latency sum overflowed a 32-bit integer on very large input files.

Tool:
UndefinedBehaviorSanitizer

Root cause:
The aggregator used int for total latency.

Fix:
Changed total latency accumulation to int64_t.

Result:
The program correctly handles large logs without signed integer overflow.

See:

docs/ubsan.md

strace / dtruss

strace is used on Linux to analyze system-call behavior.

Example:

strace -f -c ./build/logforge --input logs/1m.log --reader slow --status-counts

Compare against the buffered reader:

strace -f -c ./build/logforge --input logs/1m.log --reader buffered --status-counts

Example issue:

Problem:
The slow reader performed one read() system call per byte.

Result:
The program spent excessive time in kernel calls.

Fix:
Implemented a buffered reader that reads large blocks at a time.

Result:
The number of read() calls dropped significantly.

On macOS, a similar experiment can be performed with dtruss.

See:

docs/strace.md

perf

perf is used to profile CPU time and identify hot paths.

Basic statistics:

perf stat ./build/logforge --input logs/1m.log --threads 8 --status-counts

Sampling profile:

perf record -g ./build/logforge --input logs/1m.log --threads 8 --top-paths 10
perf report

Example optimization story:

Before:
std::stringstream dominated parsing time.

After:
Manual parsing with std::string_view reduced temporary allocations and improved throughput.

Before:
A global aggregation map caused lock contention.

After:
Thread-local aggregation improved scalability.

See:

docs/perf.md

gprof

gprof is used for traditional flat-profile and call-graph analysis.

Build:

cmake -B build-gprof -DCMAKE_BUILD_TYPE=Release -DENABLE_GPROF=ON
cmake --build build-gprof

Run:

./build-gprof/logforge --input logs/1m.log --top-paths 10
gprof ./build-gprof/logforge gmon.out > results/gprof.txt

Typical functions to inspect:

parse_line()
process_record()
update_status_counts()
update_path_counts()
compute_latency_stats()

See:

docs/gprof.md

gprofng

gprofng is used for function-level and call-tree profiling.

Collect profile data:

gprofng collect app ./build/logforge --input logs/1m.log --threads 8 --top-paths 10

Display function profile:

gprofng display text -functions test.1.er

Display call tree:

gprofng display text -calltree test.1.er

See:

docs/gprofng.md

Valgrind Massif

Massif is used to analyze heap memory usage over time.

Run:

valgrind --tool=massif ./build/logforge --input logs/1m.log --build-index

Display report:

ms_print massif.out.* > results/massif.txt

Example issue:

Problem:
Building the query index caused high peak memory usage.

Tool:
Valgrind Massif

Root cause:
The index stored duplicated strings for every record.

Fix:
Reused string storage and stored record IDs instead of duplicated records.

Result:
Peak heap usage decreased significantly.

See:

docs/massif.md

heaptrack

heaptrack is used to identify allocation hot spots and allocation-heavy code paths.

Run:

heaptrack ./build/logforge --input logs/1m.log --top-paths 10

Analyze:

heaptrack --analyze heaptrack.logforge.*.gz

Example issue:

Problem:
The parser performed excessive temporary string allocations.

Tool:
heaptrack

Fix:
Replaced repeated substring copies with std::string_view-based parsing.

Result:
Total allocation count and allocation volume decreased.

See:

docs/heaptrack.md

Cachegrind

Cachegrind is used to analyze instruction and data cache behavior.

Run:

valgrind --tool=cachegrind ./build/logforge --input logs/1m.log --status-counts

Analyze:

cg_annotate cachegrind.out.* > results/cachegrind.txt

Example issue:

Problem:
Aggregation had poor cache locality when records were stored as many separately allocated objects.

Tool:
Cachegrind

Fix:
Changed storage to contiguous vectors and reduced pointer chasing.

Result:
Data cache misses decreased.

See:

docs/cachegrind.md

Callgrind

Callgrind is used for detailed call-graph profiling.

Run:

valgrind --tool=callgrind ./build/logforge --input logs/1m.log --top-paths 10

Analyze:

callgrind_annotate callgrind.out.* > results/callgrind.txt

Optional GUI:

kcachegrind callgrind.out.*

Example issue:

Problem:
Top-path computation spent too much time sorting all paths.

Tool:
Callgrind

Fix:
Replaced full sort with a bounded min-heap for top-K selection.

Result:
The top-K query avoided unnecessary sorting work.

See:

docs/callgrind.md

clang-tidy

clang-tidy is used for static analysis and modernization checks.

Run:

clang-tidy src/*.cpp -- -Iinclude -std=c++17

Or use a helper script:

./scripts/run_clang_tidy.sh

Example checks:

modernize-use-nullptr
modernize-use-override
performance-for-range-copy
performance-unnecessary-value-param
bugprone-use-after-move
readability-const-return-type

Example issue:

Problem:
A function copied LogRecord objects unnecessarily during aggregation.

Tool:
clang-tidy

Fix:
Changed the parameter from LogRecord to const LogRecord&.

Result:
Reduced unnecessary copies and improved code clarity.

See:

docs/static_analysis.md

cppcheck

cppcheck is used as an additional static-analysis pass.

Run:

cppcheck --enable=all --inconclusive --std=c++17 -Iinclude src/

Example issue:

Problem:
A condition in the parser was always true.

Tool:
cppcheck

Fix:
Simplified the condition and added a test for malformed input.

Result:
Cleaner parser logic and better test coverage.

See:

docs/static_analysis.md

clang-format

clang-format is used to keep code style consistent.

Run:

clang-format -i include/*.h src/*.cpp tests/*.cpp

Or:

./scripts/run_format.sh

Recommended project file:

.clang-format

Example style goal:

Consistent formatting across headers, source files, tests, and bug demos.

Test Coverage

Coverage tools are used to measure how much of the code is exercised by unit tests.

GCC coverage build:

cmake -B build-coverage -DCMAKE_BUILD_TYPE=Debug -DENABLE_COVERAGE=ON
cmake --build build-coverage
ctest --test-dir build-coverage

Generate report:

lcov --capture --directory build-coverage --output-file coverage.info
genhtml coverage.info --output-directory coverage_html

LLVM coverage alternative:

llvm-profdata merge -sparse default.profraw -o coverage.profdata
llvm-cov show ./build-coverage/logforge -instr-profile=coverage.profdata

Example coverage goal:

Parser: high coverage for valid and malformed lines
Aggregator: high coverage for status counts, top-K paths, and latency stats
Query engine: coverage for valid queries, invalid queries, and empty results
Thread pool: basic concurrency behavior tests

See:

docs/coverage.md

CMake Options

LogForge supports several build-time options.

option(ENABLE_ASAN "Enable AddressSanitizer" OFF)
option(ENABLE_TSAN "Enable ThreadSanitizer" OFF)
option(ENABLE_UBSAN "Enable UndefinedBehaviorSanitizer" OFF)
option(ENABLE_GPROF "Enable gprof instrumentation" OFF)
option(ENABLE_COVERAGE "Enable coverage instrumentation" OFF)

Example sanitizer configuration:

if (ENABLE_ASAN)
    add_compile_options(-fsanitize=address -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=address)
endif()

if (ENABLE_TSAN)
    add_compile_options(-fsanitize=thread -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=thread)
endif()

if (ENABLE_UBSAN)
    add_compile_options(-fsanitize=undefined -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=undefined)
endif()

if (ENABLE_GPROF)
    add_compile_options(-pg -g)
    add_link_options(-pg)
endif()

if (ENABLE_COVERAGE)
    add_compile_options(--coverage -O0 -g)
    add_link_options(--coverage)
endif()

AddressSanitizer, ThreadSanitizer, and some profiling modes should normally be enabled in separate builds.


Benchmarking

Run the benchmark script:

./bench/run_benchmarks.sh

Example benchmark matrix:

Experiment Comparison
Parser performance stringstream vs manual parser vs string_view parser
I/O behavior one-byte reader vs buffered reader
Thread scaling 1, 2, 4, 8, and 16 threads
Aggregation strategy global lock vs thread-local aggregation
Top-K query full sort vs min-heap
Indexing direct scan vs prebuilt index
Memory usage duplicated strings vs compact record storage
Allocation behavior substring copies vs string_view parsing
Cache behavior pointer-heavy storage vs contiguous vectors

Example result table:

Configuration Records Threads Time
stringstream parser 1,000,000 1 2.84s
string_view parser 1,000,000 1 1.37s
string_view + buffered I/O 1,000,000 1 1.02s
string_view + buffered I/O 1,000,000 8 0.31s

Actual numbers depend on machine, compiler, and input size.


Controlled Bug Demos

The bugs/ directory contains intentionally flawed implementations used only for tool demonstrations.

Examples:

File Purpose
parser_crash.cpp Demonstrates GDB and LLDB debugging
memory_leak.cpp Demonstrates Valgrind and LeakSanitizer
buffer_overflow.cpp Demonstrates AddressSanitizer
use_after_free.cpp Demonstrates AddressSanitizer and Valgrind
data_race.cpp Demonstrates ThreadSanitizer
undefined_behavior.cpp Demonstrates UndefinedBehaviorSanitizer
syscall_storm.cpp Demonstrates strace/dtruss syscall tracing

These examples are isolated from the main implementation.

The main executable should pass normal tests and sanitizer checks.


Example Engineering Notes

Each debugging document follows this structure:

1. Problem
2. Tool used
3. Command
4. Key output
5. Root cause
6. Fix
7. Result after fix
8. Lessons learned

Example:

Problem:
Parallel status-code aggregation occasionally produced incorrect counts.

Tool:
ThreadSanitizer

Root cause:
Multiple threads updated a shared unordered_map without synchronization.

Fix:
Replaced shared updates with thread-local aggregation and final merge.

Result:
TSan reported no data races, and throughput improved under higher thread counts.

Suggested Development Workflow

A typical development workflow:

# 1. Format code
./scripts/run_format.sh

# 2. Run static analysis
./scripts/run_clang_tidy.sh
./scripts/run_cppcheck.sh

# 3. Build and run tests
cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug
ctest --test-dir build-debug

# 4. Run sanitizer builds
./scripts/run_asan.sh
./scripts/run_tsan.sh
./scripts/run_ubsan.sh

# 5. Run memory checks
./scripts/run_valgrind.sh

# 6. Run profiling experiments
./scripts/run_perf.sh
./scripts/run_gprof.sh
./scripts/run_gprofng.sh

# 7. Run benchmark comparison
./bench/run_benchmarks.sh

Skills Demonstrated

This project demonstrates:

  • C++17 programming
  • RAII and safe memory ownership
  • std::thread, worker queues, and thread pools
  • Locking, atomics, and thread-local data structures
  • Hash-map based aggregation
  • Buffered file I/O
  • CLI design
  • CMake build configuration
  • Debug and release build workflows
  • Interactive debugging with GDB and LLDB
  • Sanitizer integration
  • Memory debugging
  • System-call tracing
  • CPU profiling
  • Heap profiling
  • Cache profiling
  • Static analysis
  • Code formatting
  • Test coverage
  • Benchmark automation
  • Measurement-driven optimization
  • Technical documentation

Non-Goals

LogForge is not intended to be a production observability platform.

It does not aim to replace tools such as Elasticsearch, Splunk, Loki, or ClickHouse.

The purpose of this project is to demonstrate C++ systems engineering, debugging, tracing, profiling, and optimization skills in a safe, self-contained codebase.


Confidentiality Note

This project is intentionally unrelated to any prior or current employer’s internal systems, tools, data, workflows, or intellectual property.

All logs are synthetic. The project uses a generic log-processing domain to demonstrate transferable systems programming skills without relying on proprietary information.


Future Improvements

Potential extensions:

  • Memory-mapped file reader
  • Compressed log support
  • JSON log parser
  • Regex-based filtering
  • Persistent on-disk index
  • Interactive query shell
  • Flamegraph generation
  • eBPF-based tracing experiment
  • GitHub Actions CI with sanitizer builds
  • HTML benchmark report generation
  • Web dashboard for benchmark results
  • Fuzz testing with libFuzzer or AFL++
  • Package manager integration with Conan or vcpkg

License

This project is intended for educational and portfolio use.

Choose a license before publishing publicly, such as:

MIT License
Apache License 2.0
BSD 3-Clause License

Summary

LogForge is a C++ multithreaded log analytics engine built to demonstrate practical systems debugging and performance engineering.

It combines a realistic command-line application with controlled debugging labs and profiling experiments using:

GDB
LLDB
Valgrind
AddressSanitizer
LeakSanitizer
ThreadSanitizer
UndefinedBehaviorSanitizer
strace / dtruss
perf
gprof
gprofng
Massif
heaptrack
Cachegrind
Callgrind
clang-tidy
cppcheck
clang-format
gcov / lcov / llvm-cov

The result is a safe, portfolio-friendly project that highlights transferable C++ systems skills.

About

A C++ multithreaded log analytics engine built to demonstrate hands-on systems debugging, profiling, and performance-optimization workflows.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors