doTERRA Oil Data Pipeline

This repository generates structured JSON data for doTERRA products, including prices, usage information, and benefit data. The output is used by the website: https://oil-calculator-dowellness.web.app/ The website consumes the raw JSON files directly from GitHub.

Overview

The pipeline transforms raw doTERRA product data into enriched, structured knowledge.

The final outputs are:

doterra_products.json → raw product and price data
encyclopedia.json → enriched product knowledge (benefits, usage, compounds, evidence, and references)
pip.json → promotional image URLs

Data Flow (High-Level)

doTERRA website
↓
1.oil_scraper.py
↓
doterra_products.json
↓
2.deepseek_enrich.py
↓
encyclopedia.json + PIP.json
↓
(optional) PIP verification
↓
5.image_scraper.py
↓
pip.json (promo images)
↓
merge_promo.py
↓
final encyclopedia.json

Step-by-Step Workflow

1) Install dependencies

pip install -r requirements.txt

2) Scrape product data

Bashpython 1.oil_scraper.py Output: doterra_products.json

3) Enrich product data with LLM

Bashpython 2.deepseek_enrich.py Outputs:

encyclopedia.json (usage, benefits, compounds, evidence) PIP.json (product information page links)

4) Scrape promotional images

Bashpython 5.image_scraper.py Output: pip.json (contains the "promo" field with direct image URLs)

5) (Optional) Verify PIP links

Use an external LLM with web search to check and correct the PIP URLs in PIP.json.

6) Merge links into encyclopedia

Bashpython merge_promo.py This script adds both:

references.PIP (from the verified PIP.json) references.promo (from pip.json)

Result: final encyclopedia.json with all references.

Files Used by the Website

doterra_products.json
encyclopedia.json

These two files power https://oil-calculator-dowellness.web.app/

Notes

The pipeline is repeatable and incremental.
Existing entries in encyclopedia.json are skipped unless you delete the file and re-run step 3.
merge_promo.py is the current merge script (replaces the old merge_PIP.py).
JSON is optimized for the website, not for reading.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
1.oil_scraper.py		1.oil_scraper.py
10.merge_hero.py		10.merge_hero.py
11.unsplash_hero.py		11.unsplash_hero.py
2.deepseek_enrich.py		2.deepseek_enrich.py
3.merge_PIP.py		3.merge_PIP.py
4.generate_PIP.py		4.generate_PIP.py
5.image_scraper.py		5.image_scraper.py
6.clear_promo.py		6.clear_promo.py
7.infographics_add.py		7.infographics_add.py
8.enrich_TCM.py		8.enrich_TCM.py
9.gen_hero.py		9.gen_hero.py
PIP.json		PIP.json
README.md		README.md
doterra_products.json		doterra_products.json
encyclopedia.json		encyclopedia.json
hero.json		hero.json
prompts_v2.txt		prompts_v2.txt
requirements.txt		requirements.txt
unsplash_hero_updated.json		unsplash_hero_updated.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doTERRA Oil Data Pipeline

Overview

Data Flow (High-Level)

Step-by-Step Workflow

1) Install dependencies

2) Scrape product data

3) Enrich product data with LLM

4) Scrape promotional images

5) (Optional) Verify PIP links

6) Merge links into encyclopedia

Files Used by the Website

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

doTERRA Oil Data Pipeline

Overview

Data Flow (High-Level)

Step-by-Step Workflow

1) Install dependencies

2) Scrape product data

3) Enrich product data with LLM

4) Scrape promotional images

5) (Optional) Verify PIP links

6) Merge links into encyclopedia

Files Used by the Website

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages