日本語のREADMEはこちらです: README.ja.md
A toolkit and data hub for fetching, processing, and visualizing corporate information in Japan. This project automatically collects data from official government sources to provide up-to-date datasets and interactive dashboards.
- Company Creation & Termination Dashboard - Visualize daily changes on a map of Japan.
- Historical Trend of Company Numbers - Track the total number of companies over time.
- Monthly Statistics by Prefecture - View monthly creation and termination data for each prefecture.
- Automated Data Updates: A GitHub Actions workflow runs daily to fetch the latest corporate registration changes.
- Interactive Dashboards: Visualize company creation and termination data across Japan with interactive, map-based dashboards.
- Comprehensive Datasets: Provides ready-to-use CSV files for company information, government agencies, patents, and more.
- JavaScript/Deno Modules: Includes reusable modules (
GBizINFO.js,SPARQL.js) for querying the gBizINFO SPARQL endpoint directly.
All generated data is available as CSV files in the data/ directory. Key datasets include:
- Daily Changes: Daily records of new (
_created.csv) and terminated (_terminated.csv) companies. - Change Summary: A summary of creations and terminations by prefecture and month (
data/diff_summary.csv). - Entity Lists: Comprehensive lists of national agencies (
jpgovs.csv), local governments (localgovs.csv), and foreign companies (foreigns.csv). - City-Level Data: Detailed company information, patents, and trademarks for specific cities (e.g.,
data/18207/for Sabae City).
This project uses a combination of web scraping and API calls to gather data:
- Daily Difference Files: A scheduled GitHub Action scrapes the National Tax Agency's website to download daily files of corporate changes (creations, terminations, etc.).
- SPARQL Queries: Deno scripts query the gBizINFO SPARQL endpoint to retrieve structured data on government agencies, foreign companies, and basic corporate information.
- Data Processing: The raw data is processed and converted into clean, version-controlled CSV files.
- Visualization: The static HTML pages in the repository use these CSV files to generate the interactive dashboards and lists.
- gBizINFO: A service operated by the Ministry of Economy, Trade and Industry (METI) of Japan, accessed via its SPARQL endpoint.
- National Tax Agency Corporate Number Publication Site: Provides daily difference files on corporate registrations.
MIT License — see LICENSE.