LogForge

LogForge is a C++ multithreaded log analytics engine designed to demonstrate practical systems debugging, profiling, tracing, testing, and performance optimization workflows.

The project processes large synthetic log files and supports common analytics operations such as status-code aggregation, top-IP queries, top-path queries, latency statistics, error filtering, and optional query indexing.

All input data is synthetic and generated by scripts in this repository.

Project Goals

LogForge is built as a portfolio project to demonstrate hands-on experience with modern C++ systems development and professional diagnostic tools.

The project demonstrates:

Modern C++ systems programming
CMake-based build configuration
Multithreaded data processing
Interactive debugging with GDB and LLDB
Memory debugging with Valgrind Memcheck
Fast runtime bug detection with AddressSanitizer, LeakSanitizer, ThreadSanitizer, and UndefinedBehaviorSanitizer
System-call tracing with strace and dtruss
CPU profiling with perf
Function-level profiling with gprof
Experiment-based profiling with gprofng
Heap profiling with Valgrind Massif and heaptrack
Cache and call-graph profiling with Cachegrind and Callgrind
Static analysis with clang-tidy and cppcheck
Code formatting with clang-format
Test coverage with gcov, lcov, or llvm-cov
Benchmark automation and performance comparison

The focus is not only to build a working log analyzer, but also to document how real bugs and performance issues can be found, explained, fixed, and measured.

Example Use Cases

LogForge analyzes log files with records such as:

2026-06-19T10:15:21Z 192.168.1.10 GET /api/users 200 34ms
2026-06-19T10:15:22Z 192.168.1.11 POST /api/login 401 12ms
2026-06-19T10:15:23Z 192.168.1.12 GET /api/orders 500 93ms
2026-06-19T10:15:24Z 192.168.1.10 GET /api/products 200 18ms

Supported operations include:

./logforge --input logs/server.log --status-counts
./logforge --input logs/server.log --top-ips 10
./logforge --input logs/server.log --top-paths 10
./logforge --input logs/server.log --latency-stats
./logforge --input logs/server.log --errors-only
./logforge --input logs/server.log --threads 8 --status-counts
./logforge --input logs/server.log --build-index
./logforge --input logs/server.log --query "status=500"

Example output:

Status Counts
-------------
200: 824391
301: 12044
400: 2891
401: 3812
404: 5821
500: 1033

Latency Statistics
------------------
p50: 18 ms
p95: 92 ms
p99: 214 ms
max: 731 ms

Why This Project Exists

Many C++ projects show algorithms or data structures, but fewer demonstrate the debugging, tracing, and profiling workflows used in production systems work.

LogForge is designed to show the full engineering loop:

Build a realistic C++ command-line tool.
Introduce controlled memory, threading, I/O, correctness, and performance issues.
Detect those issues using professional tools.
Explain the root cause.
Fix the implementation.
Measure the improvement.
Document the workflow clearly.

This makes the project useful for demonstrating systems-level debugging, profiling, and performance engineering skills.

Tool Demonstration Matrix

Category	Tool	Scenario	Demonstrated Skill
Interactive debugging	GDB	Crash in parser, invalid object state, breakpoint inspection	Source-level C++ debugging
Interactive debugging	LLDB	Same debugging workflow using LLVM tooling	Cross-platform debugging
Memory debugging	Valgrind Memcheck	Memory leak, invalid read, uninitialized value	Deep memory diagnostics
Runtime sanitizers	AddressSanitizer	Buffer overflow, use-after-free	Fast memory-error detection
Runtime sanitizers	LeakSanitizer	Leaked allocations	Leak detection in sanitizer builds
Runtime sanitizers	ThreadSanitizer	Race condition in shared aggregation map	Thread-safety debugging
Runtime sanitizers	UndefinedBehaviorSanitizer	Integer overflow, invalid enum, bad shift	Undefined behavior detection
System tracing	strace / dtruss	Excessive `read()` system calls	System-call tracing and I/O analysis
CPU profiling	perf	Hot functions, CPU cycles, branch/cache behavior	Linux performance profiling
Function profiling	gprof	Flat profile and call graph	Function-level profiling
Function profiling	gprofng	Function and call-tree analysis	Modern GNU profiling workflow
Heap profiling	Massif	Peak heap usage in index builder	Memory footprint analysis
Heap profiling	heaptrack	Allocation hot spots	Allocation profiling
Cache profiling	Cachegrind	Cache misses in parser and aggregation	Cache behavior analysis
Call-graph profiling	Callgrind	Expensive call paths	Call-path optimization
Static analysis	clang-tidy	Bug-prone patterns and modernization suggestions	Static C++ analysis
Static analysis	cppcheck	Additional static checks	Lightweight code analysis
Formatting	clang-format	Consistent style	Automated formatting
Coverage	gcov / lcov / llvm-cov	Unit-test coverage	Test quality measurement

Repository Structure

LogForge/
├── CMakeLists.txt
├── README.md
├── include/
│   ├── LogRecord.h
│   ├── LogParser.h
│   ├── LogIndex.h
│   ├── QueryEngine.h
│   ├── ThreadPool.h
│   ├── Aggregator.h
│   └── ReportWriter.h
├── src/
│   ├── main.cpp
│   ├── LogParser.cpp
│   ├── LogIndex.cpp
│   ├── QueryEngine.cpp
│   ├── ThreadPool.cpp
│   ├── Aggregator.cpp
│   └── ReportWriter.cpp
├── tests/
│   ├── test_parser.cpp
│   ├── test_aggregator.cpp
│   ├── test_query_engine.cpp
│   └── test_thread_pool.cpp
├── bugs/
│   ├── memory_leak.cpp
│   ├── buffer_overflow.cpp
│   ├── use_after_free.cpp
│   ├── data_race.cpp
│   ├── undefined_behavior.cpp
│   ├── parser_crash.cpp
│   └── syscall_storm.cpp
├── bench/
│   ├── generate_logs.py
│   ├── run_benchmarks.sh
│   └── compare_results.py
├── scripts/
│   ├── run_gdb.sh
│   ├── run_lldb.sh
│   ├── run_valgrind.sh
│   ├── run_asan.sh
│   ├── run_lsan.sh
│   ├── run_tsan.sh
│   ├── run_ubsan.sh
│   ├── run_strace.sh
│   ├── run_perf.sh
│   ├── run_gprof.sh
│   ├── run_gprofng.sh
│   ├── run_massif.sh
│   ├── run_heaptrack.sh
│   ├── run_cachegrind.sh
│   ├── run_callgrind.sh
│   ├── run_clang_tidy.sh
│   ├── run_cppcheck.sh
│   ├── run_format.sh
│   └── run_coverage.sh
├── docs/
│   ├── gdb.md
│   ├── lldb.md
│   ├── valgrind.md
│   ├── asan.md
│   ├── lsan.md
│   ├── tsan.md
│   ├── ubsan.md
│   ├── strace.md
│   ├── perf.md
│   ├── gprof.md
│   ├── gprofng.md
│   ├── massif.md
│   ├── heaptrack.md
│   ├── cachegrind.md
│   ├── callgrind.md
│   ├── static_analysis.md
│   ├── coverage.md
│   └── docker_dev_environment.md
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── results/
│   ├── before/
│   └── after/
└── logs/
    └── generated sample logs

Core Design

LogForge is organized into several main components.

Log Parser

The parser converts raw log lines into structured records.

struct LogRecord {
    std::string timestamp;
    std::string ip;
    std::string method;
    std::string path;
    int status;
    int latency_ms;
};

The project supports multiple parser implementations for comparison:

stringstream parser: simple but slower
Manual parser: faster tokenization
string_view parser: reduced temporary allocations
Buffered reader: reduced system-call overhead

Aggregation Engine

The aggregation engine computes statistics from parsed records.

Supported aggregations include:

HTTP status-code counts
Top client IP addresses
Top requested paths
Error-only filtering
Latency percentiles
Request-method counts

Example:

./logforge --input logs/1m.log --status-counts --top-paths 10

Multithreaded Processing

LogForge supports parallel processing of large input files.

Each worker thread processes a chunk of the input and produces local statistics. The local results are merged at the end.

This design avoids unnecessary contention and provides a useful comparison against a deliberately flawed shared-map implementation.

Bad design:

global_status_counts[record.status]++;

Better design:

local_stats[thread_id].status_counts[record.status]++;

Final merge:

for (const auto& local : local_stats) {
    merge(global_stats, local);
}

This provides a realistic demonstration of using ThreadSanitizer to detect a race and then redesigning the data flow for correctness and scalability.

Optional Query Index

LogForge can build a simple in-memory index for repeated queries.

Supported query examples:

./logforge --input logs/large.log --query "status=500"
./logforge --input logs/large.log --query "ip=192.168.1.10"
./logforge --input logs/large.log --query "path=/api/login"

The index maps selected fields to matching record IDs:

status -> record IDs
ip     -> record IDs
path   -> record IDs

This provides additional opportunities to analyze memory use, allocation behavior, and query performance.

Build Instructions

Requirements

Recommended environment:

Linux
CMake
C++17 or newer compiler
GCC or Clang
Python 3 for log generation scripts

Optional tools:

GDB
LLDB
Valgrind
perf
gprof
gprofng
strace
dtruss on macOS
heaptrack
clang-tidy
cppcheck
clang-format
gcov, lcov, or llvm-cov
AddressSanitizer-capable compiler
ThreadSanitizer-capable compiler
UndefinedBehaviorSanitizer-capable compiler

Valgrind, perf, gprof, gprofng, and heaptrack are Linux-only (or, for gprof/gprofng, not part of Apple's Clang toolchain) and have no macOS equivalent. A ready-to-use Docker environment with all of these installed is provided in docker/ — see docs/docker_dev_environment.md.

Standard Build

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Run:

./build/logforge --input logs/server.log --status-counts

Debug Build

cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug

Run:

./build-debug/logforge --input logs/server.log --status-counts

Generating Synthetic Logs

Generate a small test log:

python3 bench/generate_logs.py --records 10000 --output logs/10k.log

Generate a larger benchmark log:

python3 bench/generate_logs.py --records 1000000 --output logs/1m.log

Generate a stress-test log:

python3 bench/generate_logs.py --records 10000000 --output logs/10m.log

All generated logs are synthetic and contain no private or production data.

Debugging and Profiling Workflows

GDB

GDB is used for interactive source-level debugging on Linux.

Example debug build:

cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug

Start GDB:

gdb --args ./build-debug/logforge --input logs/10k.log --status-counts

Useful commands:

break main
break LogParser::parse_line
run
next
step
continue
print record
backtrace
info locals
info threads
thread apply all backtrace

Example debugging scenario:

Problem:
The parser crashes when processing a malformed log line.

Tool:
GDB

Root cause:
The parser assumes every line contains six fields and accesses a missing token.

Fix:
Added validation before constructing LogRecord.

Result:
Malformed lines are skipped and counted in the error report instead of crashing.

See:

docs/gdb.md

LLDB

LLDB is used as an alternative interactive debugger, especially useful with Clang/LLVM-based toolchains and macOS.

Start LLDB:

lldb -- ./build-debug/logforge --input logs/10k.log --status-counts

Useful commands:

breakpoint set --name main
breakpoint set --name LogParser::parse_line
run
next
step
continue
frame variable
thread backtrace all
expression record.status

Example debugging scenario:

Problem:
The query engine returns no records for status=500.

Tool:
LLDB

Root cause:
The query parser treats the value as a string but the index stores status codes as integers.

Fix:
Added typed query parsing for numeric fields.

Result:
status=500 correctly returns all matching records.

See:

docs/lldb.md

Valgrind Memcheck

Valgrind Memcheck is used to detect memory leaks, invalid reads/writes, and uninitialized values.

Example command:

valgrind --leak-check=full --track-origins=yes ./build-debug/logforge --input logs/10k.log --status-counts

Example documented issue:

Problem:
The index builder leaked LogRecord objects when malformed lines were skipped.

Root cause:
A raw pointer was allocated before validation and was not released on the error path.

Fix:
Replaced raw owning pointers with value-based storage or std::unique_ptr.

Result:
Valgrind reported no definitely lost memory after the fix.

See:

docs/valgrind.md

AddressSanitizer

AddressSanitizer is used for fast memory-error detection during development.

Build:

cmake -B build-asan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-asan

Run:

ASAN_OPTIONS=detect_leaks=1 ./build-asan/logforge --input logs/10k.log --status-counts

Example bug scenario:

char method[4];
std::strcpy(method, token.c_str());

This can overflow for "POST" because the null terminator also requires space.

Fixed version:

std::string method;

or:

std::array<char, 8> method{};

See:

docs/asan.md

LeakSanitizer

LeakSanitizer is used to detect leaked allocations in sanitizer builds.

Build:

cmake -B build-lsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-lsan

Run:

ASAN_OPTIONS=detect_leaks=1 ./build-lsan/logforge --input logs/10k.log --build-index

Example issue:

Problem:
A temporary query index allocated nodes but failed to release them after an exception.

Tool:
LeakSanitizer

Fix:
Replaced manual allocation with RAII containers.

Result:
LeakSanitizer reported no leaks.

See:

docs/lsan.md

ThreadSanitizer

ThreadSanitizer is used to detect data races in multithreaded processing.

Build:

cmake -B build-tsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_TSAN=ON
cmake --build build-tsan

Run:

./build-tsan/logforge --input logs/1m.log --threads 8 --status-counts

Example race:

std::unordered_map<int, int> status_counts;

void process_record(const LogRecord& record) {
    status_counts[record.status]++;
}

Fixed design:

std::vector<LocalStats> local_stats(num_threads);

Each thread updates its own local statistics, and the final result is merged after all worker threads complete.

See:

docs/tsan.md

UndefinedBehaviorSanitizer

UndefinedBehaviorSanitizer is used to detect undefined behavior such as signed integer overflow, invalid shifts, invalid enum values, and null reference usage.

Build:

cmake -B build-ubsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_UBSAN=ON
cmake --build build-ubsan

Run:

./build-ubsan/logforge --input logs/10k.log --latency-stats

Example issue:

Problem:
Latency sum overflowed a 32-bit integer on very large input files.

Tool:
UndefinedBehaviorSanitizer

Root cause:
The aggregator used int for total latency.

Fix:
Changed total latency accumulation to int64_t.

Result:
The program correctly handles large logs without signed integer overflow.

See:

docs/ubsan.md

strace / dtruss

strace is used on Linux to analyze system-call behavior.

Example:

strace -f -c ./build/logforge --input logs/1m.log --reader slow --status-counts

Compare against the buffered reader:

strace -f -c ./build/logforge --input logs/1m.log --reader buffered --status-counts

Example issue:

Problem:
The slow reader performed one read() system call per byte.

Result:
The program spent excessive time in kernel calls.

Fix:
Implemented a buffered reader that reads large blocks at a time.

Result:
The number of read() calls dropped significantly.

On macOS, a similar experiment can be performed with dtruss.

See:

docs/strace.md

perf

perf is used to profile CPU time and identify hot paths.

Basic statistics:

perf stat ./build/logforge --input logs/1m.log --threads 8 --status-counts

Sampling profile:

perf record -g ./build/logforge --input logs/1m.log --threads 8 --top-paths 10
perf report

Example optimization story:

Before:
std::stringstream dominated parsing time.

After:
Manual parsing with std::string_view reduced temporary allocations and improved throughput.

Before:
A global aggregation map caused lock contention.

After:
Thread-local aggregation improved scalability.

See:

docs/perf.md

gprof

gprof is used for traditional flat-profile and call-graph analysis.

Build:

cmake -B build-gprof -DCMAKE_BUILD_TYPE=Release -DENABLE_GPROF=ON
cmake --build build-gprof

Run:

./build-gprof/logforge --input logs/1m.log --top-paths 10
gprof ./build-gprof/logforge gmon.out > results/gprof.txt

Typical functions to inspect:

parse_line()
process_record()
update_status_counts()
update_path_counts()
compute_latency_stats()

See:

docs/gprof.md

gprofng

gprofng is used for function-level and call-tree profiling.

Collect profile data:

gprofng collect app ./build/logforge --input logs/1m.log --threads 8 --top-paths 10

Display function profile:

gprofng display text -functions test.1.er

Display call tree:

gprofng display text -calltree test.1.er

See:

docs/gprofng.md

Valgrind Massif

Massif is used to analyze heap memory usage over time.

Run:

valgrind --tool=massif ./build/logforge --input logs/1m.log --build-index

Display report:

ms_print massif.out.* > results/massif.txt

Example issue:

Problem:
Building the query index caused high peak memory usage.

Tool:
Valgrind Massif

Root cause:
The index stored duplicated strings for every record.

Fix:
Reused string storage and stored record IDs instead of duplicated records.

Result:
Peak heap usage decreased significantly.

See:

docs/massif.md

heaptrack

heaptrack is used to identify allocation hot spots and allocation-heavy code paths.

Run:

heaptrack ./build/logforge --input logs/1m.log --top-paths 10

Analyze:

heaptrack --analyze heaptrack.logforge.*.gz

Example issue:

Problem:
The parser performed excessive temporary string allocations.

Tool:
heaptrack

Fix:
Replaced repeated substring copies with std::string_view-based parsing.

Result:
Total allocation count and allocation volume decreased.

See:

docs/heaptrack.md

Cachegrind

Cachegrind is used to analyze instruction and data cache behavior.

Run:

valgrind --tool=cachegrind ./build/logforge --input logs/1m.log --status-counts

Analyze:

cg_annotate cachegrind.out.* > results/cachegrind.txt

Example issue:

Problem:
Aggregation had poor cache locality when records were stored as many separately allocated objects.

Tool:
Cachegrind

Fix:
Changed storage to contiguous vectors and reduced pointer chasing.

Result:
Data cache misses decreased.

See:

docs/cachegrind.md

Callgrind

Callgrind is used for detailed call-graph profiling.

Run:

valgrind --tool=callgrind ./build/logforge --input logs/1m.log --top-paths 10

Analyze:

callgrind_annotate callgrind.out.* > results/callgrind.txt

Optional GUI:

kcachegrind callgrind.out.*

Example issue:

Problem:
Top-path computation spent too much time sorting all paths.

Tool:
Callgrind

Fix:
Replaced full sort with a bounded min-heap for top-K selection.

Result:
The top-K query avoided unnecessary sorting work.

See:

docs/callgrind.md

clang-tidy

clang-tidy is used for static analysis and modernization checks.

Run:

clang-tidy src/*.cpp -- -Iinclude -std=c++17

Or use a helper script:

./scripts/run_clang_tidy.sh

Example checks:

modernize-use-nullptr
modernize-use-override
performance-for-range-copy
performance-unnecessary-value-param
bugprone-use-after-move
readability-const-return-type

Example issue:

Problem:
A function copied LogRecord objects unnecessarily during aggregation.

Tool:
clang-tidy

Fix:
Changed the parameter from LogRecord to const LogRecord&.

Result:
Reduced unnecessary copies and improved code clarity.

See:

docs/static_analysis.md

cppcheck

cppcheck is used as an additional static-analysis pass.

Run:

cppcheck --enable=all --inconclusive --std=c++17 -Iinclude src/

Example issue:

Problem:
A condition in the parser was always true.

Tool:
cppcheck

Fix:
Simplified the condition and added a test for malformed input.

Result:
Cleaner parser logic and better test coverage.

See:

docs/static_analysis.md

clang-format

clang-format is used to keep code style consistent.

Run:

clang-format -i include/*.h src/*.cpp tests/*.cpp

Or:

./scripts/run_format.sh

Recommended project file:

.clang-format

Example style goal:

Consistent formatting across headers, source files, tests, and bug demos.

Test Coverage

Coverage tools are used to measure how much of the code is exercised by unit tests.

GCC coverage build:

cmake -B build-coverage -DCMAKE_BUILD_TYPE=Debug -DENABLE_COVERAGE=ON
cmake --build build-coverage
ctest --test-dir build-coverage

Generate report:

lcov --capture --directory build-coverage --output-file coverage.info
genhtml coverage.info --output-directory coverage_html

LLVM coverage alternative:

llvm-profdata merge -sparse default.profraw -o coverage.profdata
llvm-cov show ./build-coverage/logforge -instr-profile=coverage.profdata

Example coverage goal:

Parser: high coverage for valid and malformed lines
Aggregator: high coverage for status counts, top-K paths, and latency stats
Query engine: coverage for valid queries, invalid queries, and empty results
Thread pool: basic concurrency behavior tests

See:

docs/coverage.md

CMake Options

LogForge supports several build-time options.

option(ENABLE_ASAN "Enable AddressSanitizer" OFF)
option(ENABLE_TSAN "Enable ThreadSanitizer" OFF)
option(ENABLE_UBSAN "Enable UndefinedBehaviorSanitizer" OFF)
option(ENABLE_GPROF "Enable gprof instrumentation" OFF)
option(ENABLE_COVERAGE "Enable coverage instrumentation" OFF)

Example sanitizer configuration:

if (ENABLE_ASAN)
    add_compile_options(-fsanitize=address -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=address)
endif()

if (ENABLE_TSAN)
    add_compile_options(-fsanitize=thread -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=thread)
endif()

if (ENABLE_UBSAN)
    add_compile_options(-fsanitize=undefined -fno-omit-frame-pointer -g)
    add_link_options(-fsanitize=undefined)
endif()

if (ENABLE_GPROF)
    add_compile_options(-pg -g)
    add_link_options(-pg)
endif()

if (ENABLE_COVERAGE)
    add_compile_options(--coverage -O0 -g)
    add_link_options(--coverage)
endif()

AddressSanitizer, ThreadSanitizer, and some profiling modes should normally be enabled in separate builds.

Benchmarking

Run the benchmark script:

./bench/run_benchmarks.sh

Example benchmark matrix:

Experiment	Comparison
Parser performance	`stringstream` vs manual parser vs `string_view` parser
I/O behavior	one-byte reader vs buffered reader
Thread scaling	1, 2, 4, 8, and 16 threads
Aggregation strategy	global lock vs thread-local aggregation
Top-K query	full sort vs min-heap
Indexing	direct scan vs prebuilt index
Memory usage	duplicated strings vs compact record storage
Allocation behavior	substring copies vs `string_view` parsing
Cache behavior	pointer-heavy storage vs contiguous vectors

Example result table:

Configuration	Records	Threads	Time
stringstream parser	1,000,000	1	2.84s
string_view parser	1,000,000	1	1.37s
string_view + buffered I/O	1,000,000	1	1.02s
string_view + buffered I/O	1,000,000	8	0.31s

Actual numbers depend on machine, compiler, and input size.

Controlled Bug Demos

The bugs/ directory contains intentionally flawed implementations used only for tool demonstrations.

Examples:

File	Purpose
`parser_crash.cpp`	Demonstrates GDB and LLDB debugging
`memory_leak.cpp`	Demonstrates Valgrind and LeakSanitizer
`buffer_overflow.cpp`	Demonstrates AddressSanitizer
`use_after_free.cpp`	Demonstrates AddressSanitizer and Valgrind
`data_race.cpp`	Demonstrates ThreadSanitizer
`undefined_behavior.cpp`	Demonstrates UndefinedBehaviorSanitizer
`syscall_storm.cpp`	Demonstrates strace/dtruss syscall tracing

These examples are isolated from the main implementation.

The main executable should pass normal tests and sanitizer checks.

Example Engineering Notes

Each debugging document follows this structure:

1. Problem
2. Tool used
3. Command
4. Key output
5. Root cause
6. Fix
7. Result after fix
8. Lessons learned

Example:

Problem:
Parallel status-code aggregation occasionally produced incorrect counts.

Tool:
ThreadSanitizer

Root cause:
Multiple threads updated a shared unordered_map without synchronization.

Fix:
Replaced shared updates with thread-local aggregation and final merge.

Result:
TSan reported no data races, and throughput improved under higher thread counts.

Suggested Development Workflow

A typical development workflow:

# 1. Format code
./scripts/run_format.sh

# 2. Run static analysis
./scripts/run_clang_tidy.sh
./scripts/run_cppcheck.sh

# 3. Build and run tests
cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug
ctest --test-dir build-debug

# 4. Run sanitizer builds
./scripts/run_asan.sh
./scripts/run_tsan.sh
./scripts/run_ubsan.sh

# 5. Run memory checks
./scripts/run_valgrind.sh

# 6. Run profiling experiments
./scripts/run_perf.sh
./scripts/run_gprof.sh
./scripts/run_gprofng.sh

# 7. Run benchmark comparison
./bench/run_benchmarks.sh

Skills Demonstrated

This project demonstrates:

C++17 programming
RAII and safe memory ownership
std::thread, worker queues, and thread pools
Locking, atomics, and thread-local data structures
Hash-map based aggregation
Buffered file I/O
CLI design
CMake build configuration
Debug and release build workflows
Interactive debugging with GDB and LLDB
Sanitizer integration
Memory debugging
System-call tracing
CPU profiling
Heap profiling
Cache profiling
Static analysis
Code formatting
Test coverage
Benchmark automation
Measurement-driven optimization
Technical documentation

Non-Goals

LogForge is not intended to be a production observability platform.

It does not aim to replace tools such as Elasticsearch, Splunk, Loki, or ClickHouse.

The purpose of this project is to demonstrate C++ systems engineering, debugging, tracing, profiling, and optimization skills in a safe, self-contained codebase.

Confidentiality Note

This project is intentionally unrelated to any prior or current employer’s internal systems, tools, data, workflows, or intellectual property.

All logs are synthetic. The project uses a generic log-processing domain to demonstrate transferable systems programming skills without relying on proprietary information.

Future Improvements

Potential extensions:

Memory-mapped file reader
Compressed log support
JSON log parser
Regex-based filtering
Persistent on-disk index
Interactive query shell
Flamegraph generation
eBPF-based tracing experiment
GitHub Actions CI with sanitizer builds
HTML benchmark report generation
Web dashboard for benchmark results
Fuzz testing with libFuzzer or AFL++
Package manager integration with Conan or vcpkg

License

This project is intended for educational and portfolio use.

Choose a license before publishing publicly, such as:

MIT License
Apache License 2.0
BSD 3-Clause License

Summary

LogForge is a C++ multithreaded log analytics engine built to demonstrate practical systems debugging and performance engineering.

It combines a realistic command-line application with controlled debugging labs and profiling experiments using:

GDB
LLDB
Valgrind
AddressSanitizer
LeakSanitizer
ThreadSanitizer
UndefinedBehaviorSanitizer
strace / dtruss
perf
gprof
gprofng
Massif
heaptrack
Cachegrind
Callgrind
clang-tidy
cppcheck
clang-format
gcov / lcov / llvm-cov

The result is a safe, portfolio-friendly project that highlights transferable C++ systems skills.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bench		bench
bugs		bugs
docker		docker
docs		docs
include		include
logs		logs
results		results
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LogForge

Project Goals

Example Use Cases

Why This Project Exists

Tool Demonstration Matrix

Repository Structure

Core Design

Log Parser

Aggregation Engine

Multithreaded Processing

Optional Query Index

Build Instructions

Requirements

Standard Build

Debug Build

Generating Synthetic Logs

Debugging and Profiling Workflows

GDB

LLDB

Valgrind Memcheck

AddressSanitizer

LeakSanitizer

ThreadSanitizer

UndefinedBehaviorSanitizer

strace / dtruss

perf

gprof

gprofng

Valgrind Massif

heaptrack

Cachegrind

Callgrind

clang-tidy

cppcheck

clang-format

Test Coverage

CMake Options

Benchmarking

Controlled Bug Demos

Example Engineering Notes

Suggested Development Workflow

Skills Demonstrated

Non-Goals

Confidentiality Note

Future Improvements

License

Summary

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages