Skip to content

com-480-data-visualization/StartupAgent

Repository files navigation

Project of Data Visualization (COM-480)

Student's name SCIPER
Haodong Zheng 387661
Youliang Zhu 415773
Yueyang Pan 350575

AI Value Chain Investor Dashboard is the final COM-480 Milestone 3 visualization for comparing companies across the AI stack: physical infrastructure, compute and silicon, AI infrastructure software, foundation models, and AI-enabled applications.

AudienceUsageSetupData pipelineRepository layoutMilestone 3

Target Audience

The dashboard is designed for people who need to compare AI companies across layers, regions, and business maturity:

  • Emerging fund managers deciding where to allocate capital across the AI value chain.
  • Angel investors looking for capital-efficient private and public opportunities.
  • High-skill job seekers evaluating where valuable technical work is being created.

Final Visualization And Intended Usage

The dashboard helps users move from a market-level overview to company-level evidence. It starts with the AI value-chain structure, then supports filtering by layer and geography, comparison of valuation and efficiency metrics, and detailed inspection of each company's source-backed trend data.

Typical use cases:

  • Fund managers: use the value-chain and geography views to see where companies sit in the AI stack by region. This makes it easier to compare, for example, US compute and model companies against Chinese infrastructure suppliers or European application/software companies, then decide where different asset strategies may fit.
  • Angel investors: use the capital-efficiency view to compare how much enterprise or market value each company has created per dollar of disclosed capital raised. They can also inspect GitHub star trends and open-source signals to identify companies gaining developer attention before that attention is fully reflected in financial metrics.
  • Job seekers: use the density view to find companies with high value creation per employee. This highlights organizations that appear to produce unusually high market or private value with concentrated teams, which can be a useful proxy for technical leverage and talent density.

Each company also has a detail view with richer trend data, source links, confidence labels, and missing-data indicators. Missing metrics are kept as null rather than guessed.

Setup

Install Node dependencies for the Vite, TypeScript, and D3 frontend:

npm install

Set up the Python data-pipeline environment:

python3.11 -m venv .agent_venv
.agent_venv/bin/pip install -e src

Optional provider credentials are documented in .env_example. Copy it only if you plan to refresh live data:

cp .env_example .env

Do not commit .env or real API keys.

Run The Website

Start the local development server:

npm run dev

Open the local URL printed by Vite.

Build the static site:

npm run build

Preview the production build locally:

npm run preview

The static build output is written to dist/.

Data Pipeline

The frontend reads the frozen Milestone 3 snapshot at data/snapshot-milestone3.json. Seed files are not used directly by the dashboard; they are validated, enriched, stored in SQLite, and exported into generated JSON snapshots.

Validate seed files and write the base generated data:

npm run data:update

Initialize the SQLite metric store if needed:

npm run data:init-db

Refresh one public company by company_id or ticker:

npm run data:enrich-public -- --company nvidia

Refresh one private company by company_id:

npm run data:enrich-private -- --company openai

Extract public-company customer concentration for one company:

npm run data:extract-public-customers -- --company nvidia

Refresh deterministic non-financial signals such as open-source activity:

npm run data:enrich-signals -- --company mistral_ai

Freeze the current enriched outputs for Milestone 3:

.agent_venv/bin/python -m pipeline snapshot --name milestone3

Run pipeline tests:

npm run data:test

Pipeline Notes

  • Public companies are enriched from structured sources such as yfinance, AkShare, SEC EDGAR, CNInfo/HKEX filing routes, Financial Datasets, and FMP when credentials are available.
  • Private companies are enriched from deterministic public-web and source-backed signals first. Optional LLM extraction is available for founding facts, funding, ARR, and commercial relationships when structured sources are incomplete.
  • data/metrics.sqlite is the source of truth for acquired observations. JSON files under data/ are generated frontend and review artifacts.
  • Source fetches and LLM outputs are cached under data/cache/ where applicable.
  • Low-confidence, missing, or failed observations are written to data/review/ instead of being silently filled.

Repository Layout

  • src/dashboard/ - D3 dashboard views, interaction logic, data loading, primitives, and callouts.
  • src/main.ts - frontend entrypoint loaded by Vite.
  • src/data/scripts/ - Python data pipeline CLI, source adapters, schema logic, storage, extractors, prompts, and tests.
  • src/data/seeds/ - acquisition-class seed CSVs for public, private, and manual-review companies.
  • data/ - generated machine-readable outputs, SQLite metrics store, frozen Milestone 3 snapshot, and review files.
  • doc/ - milestone requirements, data/schema documentation, process-book notes, and screencast planning.
  • report/ - final process book source and PDF.
  • public/maps/ - static geographic map assets used by the dashboard.
  • .env_example - documented provider keys and feature toggles.

Development Checks

Run the frontend build:

npm run build

Run the data-pipeline unit tests:

npm run data:test

Milestone 3 Deliverables

Milestone 3 requires a GitHub repository with clean code, data, process book PDF, and setup/usage README; a screencast of at most 2 minutes; and a process book of at most 8 pages.

Repository deliverable links:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors