LogForge is a C++ multithreaded log analytics engine designed to demonstrate practical systems debugging, profiling, tracing, testing, and performance optimization workflows.
The project processes large synthetic log files and supports common analytics operations such as status-code aggregation, top-IP queries, top-path queries, latency statistics, error filtering, and optional query indexing.
All input data is synthetic and generated by scripts in this repository.
LogForge is built as a portfolio project to demonstrate hands-on experience with modern C++ systems development and professional diagnostic tools.
The project demonstrates:
- Modern C++ systems programming
- CMake-based build configuration
- Multithreaded data processing
- Interactive debugging with GDB and LLDB
- Memory debugging with Valgrind Memcheck
- Fast runtime bug detection with AddressSanitizer, LeakSanitizer, ThreadSanitizer, and UndefinedBehaviorSanitizer
- System-call tracing with strace and dtruss
- CPU profiling with perf
- Function-level profiling with gprof
- Experiment-based profiling with gprofng
- Heap profiling with Valgrind Massif and heaptrack
- Cache and call-graph profiling with Cachegrind and Callgrind
- Static analysis with clang-tidy and cppcheck
- Code formatting with clang-format
- Test coverage with gcov, lcov, or llvm-cov
- Benchmark automation and performance comparison
The focus is not only to build a working log analyzer, but also to document how real bugs and performance issues can be found, explained, fixed, and measured.
LogForge analyzes log files with records such as:
2026-06-19T10:15:21Z 192.168.1.10 GET /api/users 200 34ms
2026-06-19T10:15:22Z 192.168.1.11 POST /api/login 401 12ms
2026-06-19T10:15:23Z 192.168.1.12 GET /api/orders 500 93ms
2026-06-19T10:15:24Z 192.168.1.10 GET /api/products 200 18ms
Supported operations include:
./logforge --input logs/server.log --status-counts
./logforge --input logs/server.log --top-ips 10
./logforge --input logs/server.log --top-paths 10
./logforge --input logs/server.log --latency-stats
./logforge --input logs/server.log --errors-only
./logforge --input logs/server.log --threads 8 --status-counts
./logforge --input logs/server.log --build-index
./logforge --input logs/server.log --query "status=500"Example output:
Status Counts
-------------
200: 824391
301: 12044
400: 2891
401: 3812
404: 5821
500: 1033
Latency Statistics
------------------
p50: 18 ms
p95: 92 ms
p99: 214 ms
max: 731 ms
Many C++ projects show algorithms or data structures, but fewer demonstrate the debugging, tracing, and profiling workflows used in production systems work.
LogForge is designed to show the full engineering loop:
- Build a realistic C++ command-line tool.
- Introduce controlled memory, threading, I/O, correctness, and performance issues.
- Detect those issues using professional tools.
- Explain the root cause.
- Fix the implementation.
- Measure the improvement.
- Document the workflow clearly.
This makes the project useful for demonstrating systems-level debugging, profiling, and performance engineering skills.
| Category | Tool | Scenario | Demonstrated Skill |
|---|---|---|---|
| Interactive debugging | GDB | Crash in parser, invalid object state, breakpoint inspection | Source-level C++ debugging |
| Interactive debugging | LLDB | Same debugging workflow using LLVM tooling | Cross-platform debugging |
| Memory debugging | Valgrind Memcheck | Memory leak, invalid read, uninitialized value | Deep memory diagnostics |
| Runtime sanitizers | AddressSanitizer | Buffer overflow, use-after-free | Fast memory-error detection |
| Runtime sanitizers | LeakSanitizer | Leaked allocations | Leak detection in sanitizer builds |
| Runtime sanitizers | ThreadSanitizer | Race condition in shared aggregation map | Thread-safety debugging |
| Runtime sanitizers | UndefinedBehaviorSanitizer | Integer overflow, invalid enum, bad shift | Undefined behavior detection |
| System tracing | strace / dtruss | Excessive read() system calls |
System-call tracing and I/O analysis |
| CPU profiling | perf | Hot functions, CPU cycles, branch/cache behavior | Linux performance profiling |
| Function profiling | gprof | Flat profile and call graph | Function-level profiling |
| Function profiling | gprofng | Function and call-tree analysis | Modern GNU profiling workflow |
| Heap profiling | Massif | Peak heap usage in index builder | Memory footprint analysis |
| Heap profiling | heaptrack | Allocation hot spots | Allocation profiling |
| Cache profiling | Cachegrind | Cache misses in parser and aggregation | Cache behavior analysis |
| Call-graph profiling | Callgrind | Expensive call paths | Call-path optimization |
| Static analysis | clang-tidy | Bug-prone patterns and modernization suggestions | Static C++ analysis |
| Static analysis | cppcheck | Additional static checks | Lightweight code analysis |
| Formatting | clang-format | Consistent style | Automated formatting |
| Coverage | gcov / lcov / llvm-cov | Unit-test coverage | Test quality measurement |
LogForge/
├── CMakeLists.txt
├── README.md
├── include/
│ ├── LogRecord.h
│ ├── LogParser.h
│ ├── LogIndex.h
│ ├── QueryEngine.h
│ ├── ThreadPool.h
│ ├── Aggregator.h
│ └── ReportWriter.h
├── src/
│ ├── main.cpp
│ ├── LogParser.cpp
│ ├── LogIndex.cpp
│ ├── QueryEngine.cpp
│ ├── ThreadPool.cpp
│ ├── Aggregator.cpp
│ └── ReportWriter.cpp
├── tests/
│ ├── test_parser.cpp
│ ├── test_aggregator.cpp
│ ├── test_query_engine.cpp
│ └── test_thread_pool.cpp
├── bugs/
│ ├── memory_leak.cpp
│ ├── buffer_overflow.cpp
│ ├── use_after_free.cpp
│ ├── data_race.cpp
│ ├── undefined_behavior.cpp
│ ├── parser_crash.cpp
│ └── syscall_storm.cpp
├── bench/
│ ├── generate_logs.py
│ ├── run_benchmarks.sh
│ └── compare_results.py
├── scripts/
│ ├── run_gdb.sh
│ ├── run_lldb.sh
│ ├── run_valgrind.sh
│ ├── run_asan.sh
│ ├── run_lsan.sh
│ ├── run_tsan.sh
│ ├── run_ubsan.sh
│ ├── run_strace.sh
│ ├── run_perf.sh
│ ├── run_gprof.sh
│ ├── run_gprofng.sh
│ ├── run_massif.sh
│ ├── run_heaptrack.sh
│ ├── run_cachegrind.sh
│ ├── run_callgrind.sh
│ ├── run_clang_tidy.sh
│ ├── run_cppcheck.sh
│ ├── run_format.sh
│ └── run_coverage.sh
├── docs/
│ ├── gdb.md
│ ├── lldb.md
│ ├── valgrind.md
│ ├── asan.md
│ ├── lsan.md
│ ├── tsan.md
│ ├── ubsan.md
│ ├── strace.md
│ ├── perf.md
│ ├── gprof.md
│ ├── gprofng.md
│ ├── massif.md
│ ├── heaptrack.md
│ ├── cachegrind.md
│ ├── callgrind.md
│ ├── static_analysis.md
│ ├── coverage.md
│ └── docker_dev_environment.md
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml
├── results/
│ ├── before/
│ └── after/
└── logs/
└── generated sample logs
LogForge is organized into several main components.
The parser converts raw log lines into structured records.
struct LogRecord {
std::string timestamp;
std::string ip;
std::string method;
std::string path;
int status;
int latency_ms;
};The project supports multiple parser implementations for comparison:
stringstreamparser: simple but slower- Manual parser: faster tokenization
string_viewparser: reduced temporary allocations- Buffered reader: reduced system-call overhead
The aggregation engine computes statistics from parsed records.
Supported aggregations include:
- HTTP status-code counts
- Top client IP addresses
- Top requested paths
- Error-only filtering
- Latency percentiles
- Request-method counts
Example:
./logforge --input logs/1m.log --status-counts --top-paths 10LogForge supports parallel processing of large input files.
Each worker thread processes a chunk of the input and produces local statistics. The local results are merged at the end.
This design avoids unnecessary contention and provides a useful comparison against a deliberately flawed shared-map implementation.
Bad design:
global_status_counts[record.status]++;Better design:
local_stats[thread_id].status_counts[record.status]++;Final merge:
for (const auto& local : local_stats) {
merge(global_stats, local);
}This provides a realistic demonstration of using ThreadSanitizer to detect a race and then redesigning the data flow for correctness and scalability.
LogForge can build a simple in-memory index for repeated queries.
Supported query examples:
./logforge --input logs/large.log --query "status=500"
./logforge --input logs/large.log --query "ip=192.168.1.10"
./logforge --input logs/large.log --query "path=/api/login"The index maps selected fields to matching record IDs:
status -> record IDs
ip -> record IDs
path -> record IDs
This provides additional opportunities to analyze memory use, allocation behavior, and query performance.
Recommended environment:
- Linux
- CMake
- C++17 or newer compiler
- GCC or Clang
- Python 3 for log generation scripts
Optional tools:
- GDB
- LLDB
- Valgrind
- perf
- gprof
- gprofng
- strace
- dtruss on macOS
- heaptrack
- clang-tidy
- cppcheck
- clang-format
- gcov, lcov, or llvm-cov
- AddressSanitizer-capable compiler
- ThreadSanitizer-capable compiler
- UndefinedBehaviorSanitizer-capable compiler
Valgrind, perf, gprof, gprofng, and heaptrack are Linux-only (or, for gprof/gprofng, not part of
Apple's Clang toolchain) and have no macOS equivalent. A ready-to-use Docker environment with all
of these installed is provided in docker/ — see docs/docker_dev_environment.md.
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build buildRun:
./build/logforge --input logs/server.log --status-countscmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debugRun:
./build-debug/logforge --input logs/server.log --status-countsGenerate a small test log:
python3 bench/generate_logs.py --records 10000 --output logs/10k.logGenerate a larger benchmark log:
python3 bench/generate_logs.py --records 1000000 --output logs/1m.logGenerate a stress-test log:
python3 bench/generate_logs.py --records 10000000 --output logs/10m.logAll generated logs are synthetic and contain no private or production data.
GDB is used for interactive source-level debugging on Linux.
Example debug build:
cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debugStart GDB:
gdb --args ./build-debug/logforge --input logs/10k.log --status-countsUseful commands:
break main
break LogParser::parse_line
run
next
step
continue
print record
backtrace
info locals
info threads
thread apply all backtraceExample debugging scenario:
Problem:
The parser crashes when processing a malformed log line.
Tool:
GDB
Root cause:
The parser assumes every line contains six fields and accesses a missing token.
Fix:
Added validation before constructing LogRecord.
Result:
Malformed lines are skipped and counted in the error report instead of crashing.
See:
docs/gdb.md
LLDB is used as an alternative interactive debugger, especially useful with Clang/LLVM-based toolchains and macOS.
Start LLDB:
lldb -- ./build-debug/logforge --input logs/10k.log --status-countsUseful commands:
breakpoint set --name main
breakpoint set --name LogParser::parse_line
run
next
step
continue
frame variable
thread backtrace all
expression record.status
Example debugging scenario:
Problem:
The query engine returns no records for status=500.
Tool:
LLDB
Root cause:
The query parser treats the value as a string but the index stores status codes as integers.
Fix:
Added typed query parsing for numeric fields.
Result:
status=500 correctly returns all matching records.
See:
docs/lldb.md
Valgrind Memcheck is used to detect memory leaks, invalid reads/writes, and uninitialized values.
Example command:
valgrind --leak-check=full --track-origins=yes ./build-debug/logforge --input logs/10k.log --status-countsExample documented issue:
Problem:
The index builder leaked LogRecord objects when malformed lines were skipped.
Root cause:
A raw pointer was allocated before validation and was not released on the error path.
Fix:
Replaced raw owning pointers with value-based storage or std::unique_ptr.
Result:
Valgrind reported no definitely lost memory after the fix.
See:
docs/valgrind.md
AddressSanitizer is used for fast memory-error detection during development.
Build:
cmake -B build-asan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-asanRun:
ASAN_OPTIONS=detect_leaks=1 ./build-asan/logforge --input logs/10k.log --status-countsExample bug scenario:
char method[4];
std::strcpy(method, token.c_str());This can overflow for "POST" because the null terminator also requires space.
Fixed version:
std::string method;or:
std::array<char, 8> method{};See:
docs/asan.md
LeakSanitizer is used to detect leaked allocations in sanitizer builds.
Build:
cmake -B build-lsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
cmake --build build-lsanRun:
ASAN_OPTIONS=detect_leaks=1 ./build-lsan/logforge --input logs/10k.log --build-indexExample issue:
Problem:
A temporary query index allocated nodes but failed to release them after an exception.
Tool:
LeakSanitizer
Fix:
Replaced manual allocation with RAII containers.
Result:
LeakSanitizer reported no leaks.
See:
docs/lsan.md
ThreadSanitizer is used to detect data races in multithreaded processing.
Build:
cmake -B build-tsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_TSAN=ON
cmake --build build-tsanRun:
./build-tsan/logforge --input logs/1m.log --threads 8 --status-countsExample race:
std::unordered_map<int, int> status_counts;
void process_record(const LogRecord& record) {
status_counts[record.status]++;
}Fixed design:
std::vector<LocalStats> local_stats(num_threads);Each thread updates its own local statistics, and the final result is merged after all worker threads complete.
See:
docs/tsan.md
UndefinedBehaviorSanitizer is used to detect undefined behavior such as signed integer overflow, invalid shifts, invalid enum values, and null reference usage.
Build:
cmake -B build-ubsan -DCMAKE_BUILD_TYPE=Debug -DENABLE_UBSAN=ON
cmake --build build-ubsanRun:
./build-ubsan/logforge --input logs/10k.log --latency-statsExample issue:
Problem:
Latency sum overflowed a 32-bit integer on very large input files.
Tool:
UndefinedBehaviorSanitizer
Root cause:
The aggregator used int for total latency.
Fix:
Changed total latency accumulation to int64_t.
Result:
The program correctly handles large logs without signed integer overflow.
See:
docs/ubsan.md
strace is used on Linux to analyze system-call behavior.
Example:
strace -f -c ./build/logforge --input logs/1m.log --reader slow --status-countsCompare against the buffered reader:
strace -f -c ./build/logforge --input logs/1m.log --reader buffered --status-countsExample issue:
Problem:
The slow reader performed one read() system call per byte.
Result:
The program spent excessive time in kernel calls.
Fix:
Implemented a buffered reader that reads large blocks at a time.
Result:
The number of read() calls dropped significantly.
On macOS, a similar experiment can be performed with dtruss.
See:
docs/strace.md
perf is used to profile CPU time and identify hot paths.
Basic statistics:
perf stat ./build/logforge --input logs/1m.log --threads 8 --status-countsSampling profile:
perf record -g ./build/logforge --input logs/1m.log --threads 8 --top-paths 10
perf reportExample optimization story:
Before:
std::stringstream dominated parsing time.
After:
Manual parsing with std::string_view reduced temporary allocations and improved throughput.
Before:
A global aggregation map caused lock contention.
After:
Thread-local aggregation improved scalability.
See:
docs/perf.md
gprof is used for traditional flat-profile and call-graph analysis.
Build:
cmake -B build-gprof -DCMAKE_BUILD_TYPE=Release -DENABLE_GPROF=ON
cmake --build build-gprofRun:
./build-gprof/logforge --input logs/1m.log --top-paths 10
gprof ./build-gprof/logforge gmon.out > results/gprof.txtTypical functions to inspect:
parse_line()
process_record()
update_status_counts()
update_path_counts()
compute_latency_stats()
See:
docs/gprof.md
gprofng is used for function-level and call-tree profiling.
Collect profile data:
gprofng collect app ./build/logforge --input logs/1m.log --threads 8 --top-paths 10Display function profile:
gprofng display text -functions test.1.erDisplay call tree:
gprofng display text -calltree test.1.erSee:
docs/gprofng.md
Massif is used to analyze heap memory usage over time.
Run:
valgrind --tool=massif ./build/logforge --input logs/1m.log --build-indexDisplay report:
ms_print massif.out.* > results/massif.txtExample issue:
Problem:
Building the query index caused high peak memory usage.
Tool:
Valgrind Massif
Root cause:
The index stored duplicated strings for every record.
Fix:
Reused string storage and stored record IDs instead of duplicated records.
Result:
Peak heap usage decreased significantly.
See:
docs/massif.md
heaptrack is used to identify allocation hot spots and allocation-heavy code paths.
Run:
heaptrack ./build/logforge --input logs/1m.log --top-paths 10Analyze:
heaptrack --analyze heaptrack.logforge.*.gzExample issue:
Problem:
The parser performed excessive temporary string allocations.
Tool:
heaptrack
Fix:
Replaced repeated substring copies with std::string_view-based parsing.
Result:
Total allocation count and allocation volume decreased.
See:
docs/heaptrack.md
Cachegrind is used to analyze instruction and data cache behavior.
Run:
valgrind --tool=cachegrind ./build/logforge --input logs/1m.log --status-countsAnalyze:
cg_annotate cachegrind.out.* > results/cachegrind.txtExample issue:
Problem:
Aggregation had poor cache locality when records were stored as many separately allocated objects.
Tool:
Cachegrind
Fix:
Changed storage to contiguous vectors and reduced pointer chasing.
Result:
Data cache misses decreased.
See:
docs/cachegrind.md
Callgrind is used for detailed call-graph profiling.
Run:
valgrind --tool=callgrind ./build/logforge --input logs/1m.log --top-paths 10Analyze:
callgrind_annotate callgrind.out.* > results/callgrind.txtOptional GUI:
kcachegrind callgrind.out.*Example issue:
Problem:
Top-path computation spent too much time sorting all paths.
Tool:
Callgrind
Fix:
Replaced full sort with a bounded min-heap for top-K selection.
Result:
The top-K query avoided unnecessary sorting work.
See:
docs/callgrind.md
clang-tidy is used for static analysis and modernization checks.
Run:
clang-tidy src/*.cpp -- -Iinclude -std=c++17Or use a helper script:
./scripts/run_clang_tidy.shExample checks:
modernize-use-nullptr
modernize-use-override
performance-for-range-copy
performance-unnecessary-value-param
bugprone-use-after-move
readability-const-return-type
Example issue:
Problem:
A function copied LogRecord objects unnecessarily during aggregation.
Tool:
clang-tidy
Fix:
Changed the parameter from LogRecord to const LogRecord&.
Result:
Reduced unnecessary copies and improved code clarity.
See:
docs/static_analysis.md
cppcheck is used as an additional static-analysis pass.
Run:
cppcheck --enable=all --inconclusive --std=c++17 -Iinclude src/Example issue:
Problem:
A condition in the parser was always true.
Tool:
cppcheck
Fix:
Simplified the condition and added a test for malformed input.
Result:
Cleaner parser logic and better test coverage.
See:
docs/static_analysis.md
clang-format is used to keep code style consistent.
Run:
clang-format -i include/*.h src/*.cpp tests/*.cppOr:
./scripts/run_format.shRecommended project file:
.clang-format
Example style goal:
Consistent formatting across headers, source files, tests, and bug demos.
Coverage tools are used to measure how much of the code is exercised by unit tests.
GCC coverage build:
cmake -B build-coverage -DCMAKE_BUILD_TYPE=Debug -DENABLE_COVERAGE=ON
cmake --build build-coverage
ctest --test-dir build-coverageGenerate report:
lcov --capture --directory build-coverage --output-file coverage.info
genhtml coverage.info --output-directory coverage_htmlLLVM coverage alternative:
llvm-profdata merge -sparse default.profraw -o coverage.profdata
llvm-cov show ./build-coverage/logforge -instr-profile=coverage.profdataExample coverage goal:
Parser: high coverage for valid and malformed lines
Aggregator: high coverage for status counts, top-K paths, and latency stats
Query engine: coverage for valid queries, invalid queries, and empty results
Thread pool: basic concurrency behavior tests
See:
docs/coverage.md
LogForge supports several build-time options.
option(ENABLE_ASAN "Enable AddressSanitizer" OFF)
option(ENABLE_TSAN "Enable ThreadSanitizer" OFF)
option(ENABLE_UBSAN "Enable UndefinedBehaviorSanitizer" OFF)
option(ENABLE_GPROF "Enable gprof instrumentation" OFF)
option(ENABLE_COVERAGE "Enable coverage instrumentation" OFF)Example sanitizer configuration:
if (ENABLE_ASAN)
add_compile_options(-fsanitize=address -fno-omit-frame-pointer -g)
add_link_options(-fsanitize=address)
endif()
if (ENABLE_TSAN)
add_compile_options(-fsanitize=thread -fno-omit-frame-pointer -g)
add_link_options(-fsanitize=thread)
endif()
if (ENABLE_UBSAN)
add_compile_options(-fsanitize=undefined -fno-omit-frame-pointer -g)
add_link_options(-fsanitize=undefined)
endif()
if (ENABLE_GPROF)
add_compile_options(-pg -g)
add_link_options(-pg)
endif()
if (ENABLE_COVERAGE)
add_compile_options(--coverage -O0 -g)
add_link_options(--coverage)
endif()AddressSanitizer, ThreadSanitizer, and some profiling modes should normally be enabled in separate builds.
Run the benchmark script:
./bench/run_benchmarks.shExample benchmark matrix:
| Experiment | Comparison |
|---|---|
| Parser performance | stringstream vs manual parser vs string_view parser |
| I/O behavior | one-byte reader vs buffered reader |
| Thread scaling | 1, 2, 4, 8, and 16 threads |
| Aggregation strategy | global lock vs thread-local aggregation |
| Top-K query | full sort vs min-heap |
| Indexing | direct scan vs prebuilt index |
| Memory usage | duplicated strings vs compact record storage |
| Allocation behavior | substring copies vs string_view parsing |
| Cache behavior | pointer-heavy storage vs contiguous vectors |
Example result table:
| Configuration | Records | Threads | Time |
|---|---|---|---|
| stringstream parser | 1,000,000 | 1 | 2.84s |
| string_view parser | 1,000,000 | 1 | 1.37s |
| string_view + buffered I/O | 1,000,000 | 1 | 1.02s |
| string_view + buffered I/O | 1,000,000 | 8 | 0.31s |
Actual numbers depend on machine, compiler, and input size.
The bugs/ directory contains intentionally flawed implementations used only for tool demonstrations.
Examples:
| File | Purpose |
|---|---|
parser_crash.cpp |
Demonstrates GDB and LLDB debugging |
memory_leak.cpp |
Demonstrates Valgrind and LeakSanitizer |
buffer_overflow.cpp |
Demonstrates AddressSanitizer |
use_after_free.cpp |
Demonstrates AddressSanitizer and Valgrind |
data_race.cpp |
Demonstrates ThreadSanitizer |
undefined_behavior.cpp |
Demonstrates UndefinedBehaviorSanitizer |
syscall_storm.cpp |
Demonstrates strace/dtruss syscall tracing |
These examples are isolated from the main implementation.
The main executable should pass normal tests and sanitizer checks.
Each debugging document follows this structure:
1. Problem
2. Tool used
3. Command
4. Key output
5. Root cause
6. Fix
7. Result after fix
8. Lessons learned
Example:
Problem:
Parallel status-code aggregation occasionally produced incorrect counts.
Tool:
ThreadSanitizer
Root cause:
Multiple threads updated a shared unordered_map without synchronization.
Fix:
Replaced shared updates with thread-local aggregation and final merge.
Result:
TSan reported no data races, and throughput improved under higher thread counts.
A typical development workflow:
# 1. Format code
./scripts/run_format.sh
# 2. Run static analysis
./scripts/run_clang_tidy.sh
./scripts/run_cppcheck.sh
# 3. Build and run tests
cmake -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug
ctest --test-dir build-debug
# 4. Run sanitizer builds
./scripts/run_asan.sh
./scripts/run_tsan.sh
./scripts/run_ubsan.sh
# 5. Run memory checks
./scripts/run_valgrind.sh
# 6. Run profiling experiments
./scripts/run_perf.sh
./scripts/run_gprof.sh
./scripts/run_gprofng.sh
# 7. Run benchmark comparison
./bench/run_benchmarks.shThis project demonstrates:
- C++17 programming
- RAII and safe memory ownership
std::thread, worker queues, and thread pools- Locking, atomics, and thread-local data structures
- Hash-map based aggregation
- Buffered file I/O
- CLI design
- CMake build configuration
- Debug and release build workflows
- Interactive debugging with GDB and LLDB
- Sanitizer integration
- Memory debugging
- System-call tracing
- CPU profiling
- Heap profiling
- Cache profiling
- Static analysis
- Code formatting
- Test coverage
- Benchmark automation
- Measurement-driven optimization
- Technical documentation
LogForge is not intended to be a production observability platform.
It does not aim to replace tools such as Elasticsearch, Splunk, Loki, or ClickHouse.
The purpose of this project is to demonstrate C++ systems engineering, debugging, tracing, profiling, and optimization skills in a safe, self-contained codebase.
This project is intentionally unrelated to any prior or current employer’s internal systems, tools, data, workflows, or intellectual property.
All logs are synthetic. The project uses a generic log-processing domain to demonstrate transferable systems programming skills without relying on proprietary information.
Potential extensions:
- Memory-mapped file reader
- Compressed log support
- JSON log parser
- Regex-based filtering
- Persistent on-disk index
- Interactive query shell
- Flamegraph generation
- eBPF-based tracing experiment
- GitHub Actions CI with sanitizer builds
- HTML benchmark report generation
- Web dashboard for benchmark results
- Fuzz testing with libFuzzer or AFL++
- Package manager integration with Conan or vcpkg
This project is intended for educational and portfolio use.
Choose a license before publishing publicly, such as:
MIT License
Apache License 2.0
BSD 3-Clause License
LogForge is a C++ multithreaded log analytics engine built to demonstrate practical systems debugging and performance engineering.
It combines a realistic command-line application with controlled debugging labs and profiling experiments using:
GDB
LLDB
Valgrind
AddressSanitizer
LeakSanitizer
ThreadSanitizer
UndefinedBehaviorSanitizer
strace / dtruss
perf
gprof
gprofng
Massif
heaptrack
Cachegrind
Callgrind
clang-tidy
cppcheck
clang-format
gcov / lcov / llvm-cov
The result is a safe, portfolio-friendly project that highlights transferable C++ systems skills.