Language: English | Русский
This project converts official SOLIDWORKS API documentation from .chm files into a local Markdown knowledge base and a compact navigation index for Codex/LLM agents.
The goal is to make it faster to write VBA macros and C# add-in applications without loading tens of thousands of documentation files into the model context.
This public repository contains the pipeline scripts, an example generated markdown/ corpus, and a ready-to-use llm_index/ based on SOLIDWORKS API 2024. Original .chm files and intermediate extracted HTML files are not published.
The project code is released under the MIT License and can be used in commercial projects. Generated Markdown may contain material derived from official SOLIDWORKS API documentation; see NOTICE.md.
API_HTML/- HTML files extracted from CHM, without images/CSS/JS.markdown/- Markdown documentation, one logical API topic per file.llm_index/- compact lookup and graph indexes for fast navigation.AGENTS.md- instructions for Codex on how to use the index without wasting context.
.
├── api/ # source .chm files for a specific SOLIDWORKS API version
├── extracted/ # full CHM extraction, generated
├── API_HTML/ # only .htm/.html files, generated
├── markdown/ # Markdown documentation, published as a SOLIDWORKS API 2024 example
├── llm_index/ # Codex/LLM index, published as a SOLIDWORKS API 2024 example
├── scripts/
│ ├── unpack_chm_recursive.sh
│ ├── collect_html.py
│ ├── convert_sw_api_docs.py
│ └── build_llm_index.py
├── AGENTS.md
├── NOTICE.md
├── LICENSE
├── requirements.txt
└── README.md
Python:
python3 -m pip install -r requirements.txtTo extract CHM files, you need extract_chmLib from chmlib.
Debian/Ubuntu:
sudo apt install libchm-binFedora:
sudo dnf install chmlibArch Linux:
sudo pacman -S chmlibCheck the command:
which extract_chmLib
extract_chmLib 2>&1 | headIf your distribution package does not provide extract_chmLib, install chmlib from source or use any compatible CHM extractor that supports this command shape:
extract_chmLib input.chm output_directoryPut .chm files for the target SOLIDWORKS API version into:
api/
Example:
api/sldworksapi.chm
api/sldworksapivb6.chm
api/swconst.chm
api/cworksapi.chm
./scripts/unpack_chm_recursive.sh api extractedThe script runs multiple passes because extracted CHM files can contain nested .chm files.
python3 scripts/collect_html.py extracted -o API_HTML --cleanThis step removes images, CSS, JS, and other non-HTML files while preserving the directory structure.
python3 scripts/convert_sw_api_docs.py API_HTML -o markdownThe converter is specialized for the SOLIDWORKS/Innovasys help format:
- removes navigation and service markup;
- preserves headings, sections, tables, and lists;
- converts Syntax blocks into fenced code blocks;
- adds YAML metadata;
- preserves relationships through Markdown links.
python3 scripts/build_llm_index.py markdown -o llm_indexThe index is not meant to be a RAG database. It is a cheap navigation layer:
- find a symbol first;
- inspect its interface or related topics;
- open only the Markdown files that are actually needed.
find markdown -type f -name '*.md' | wc -l
find llm_index -maxdepth 1 -type f | sort
rg 'IModelDoc2\\.Save' llm_index/symbols.tsvOpen this repository as a workspace, or add it as a second folder in a multi-root workspace next to your VBA/C# project.
Codex should be able to see:
AGENTS.mdllm_index/markdown/
Example prompt:
Use the SOLIDWORKS API knowledge base in this workspace.
Follow AGENTS.md.
I am writing a VBA macro. Find the correct API docs before writing code.
For a C# add-in:
Follow AGENTS.md.
I am writing a C# SOLIDWORKS add-in.
Prefer .NET API docs and C# syntax blocks.
Find a symbol:
rg 'IModelDoc2\\.Save' llm_index/symbols.tsvFind members of an interface:
rg '"interface": "IModelDoc2"' llm_index/interface_members.jsonlOpen the selected document:
sed -n '1,220p' markdown/sldworksapi/<file>.mdmanifest.json- corpus summary.symbols.tsv- cheapest API symbol lookup.documents.tsv- fast document lookup.symbols.jsonl- structured symbol lookup.documents.jsonl- compact card for each Markdown file.nodes.jsonl- graph nodes.edges/- shardedhas_member,member_of, andlinks_tograph edges by source module.edges_manifest.json- edge shard counts and type summaries.interface_members.jsonl- interface/object -> methods/properties.modules.json- API module statistics.
Usually commit:
scripts/README.mdREADME.ru.mdAGENTS.mdLICENSENOTICE.mdrequirements.txtmarkdown/, if you intentionally publish a generated example corpus for a specific API versionllm_index/, if you intentionally publish a ready-to-use index for the included Markdown corpus
Keep large/reproducible artifacts out of Git or publish them separately:
api/extracted/API_HTML/
If you do not want generated data in Git history, publish markdown/ and llm_index/ as release artifacts instead.