This repository generates structured JSON data for doTERRA products, including prices, usage information, and benefit data. The output is used by the website: https://oil-calculator-dowellness.web.app/ The website consumes the raw JSON files directly from GitHub.
The pipeline transforms raw doTERRA product data into enriched, structured knowledge.
The final outputs are:
doterra_products.json→ raw product and price dataencyclopedia.json→ enriched product knowledge (benefits, usage, compounds, evidence, and references)pip.json→ promotional image URLs
doTERRA website
↓
1.oil_scraper.py
↓
doterra_products.json
↓
2.deepseek_enrich.py
↓
encyclopedia.json + PIP.json
↓
(optional) PIP verification
↓
5.image_scraper.py
↓
pip.json (promo images)
↓
merge_promo.py
↓
final encyclopedia.json
pip install -r requirements.txt
Bashpython 1.oil_scraper.py Output: doterra_products.json
Bashpython 2.deepseek_enrich.py Outputs:
encyclopedia.json (usage, benefits, compounds, evidence) PIP.json (product information page links)
Bashpython 5.image_scraper.py Output: pip.json (contains the "promo" field with direct image URLs)
Use an external LLM with web search to check and correct the PIP URLs in PIP.json.
Bashpython merge_promo.py This script adds both:
references.PIP (from the verified PIP.json) references.promo (from pip.json)
Result: final encyclopedia.json with all references.
- doterra_products.json
- encyclopedia.json
These two files power https://oil-calculator-dowellness.web.app/
- The pipeline is repeatable and incremental.
- Existing entries in encyclopedia.json are skipped unless you delete the file and re-run step 3.
- merge_promo.py is the current merge script (replaces the old merge_PIP.py).
- JSON is optimized for the website, not for reading.