Skip to content

fjramaker/OilUpdaterHK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

doTERRA Oil Data Pipeline

This repository generates structured JSON data for doTERRA products, including prices, usage information, and benefit data. The output is used by the website: https://oil-calculator-dowellness.web.app/ The website consumes the raw JSON files directly from GitHub.


Overview

The pipeline transforms raw doTERRA product data into enriched, structured knowledge.

The final outputs are:

  • doterra_products.json → raw product and price data
  • encyclopedia.json → enriched product knowledge (benefits, usage, compounds, evidence, and references)
  • pip.json → promotional image URLs

Data Flow (High-Level)

doTERRA website

1.oil_scraper.py

doterra_products.json

2.deepseek_enrich.py

encyclopedia.json + PIP.json

(optional) PIP verification

5.image_scraper.py

pip.json (promo images)

merge_promo.py

final encyclopedia.json


Step-by-Step Workflow

1) Install dependencies

pip install -r requirements.txt

2) Scrape product data

Bashpython 1.oil_scraper.py Output: doterra_products.json

3) Enrich product data with LLM

Bashpython 2.deepseek_enrich.py Outputs:

encyclopedia.json (usage, benefits, compounds, evidence) PIP.json (product information page links)

4) Scrape promotional images

Bashpython 5.image_scraper.py Output: pip.json (contains the "promo" field with direct image URLs)

5) (Optional) Verify PIP links

Use an external LLM with web search to check and correct the PIP URLs in PIP.json.

6) Merge links into encyclopedia

Bashpython merge_promo.py This script adds both:

references.PIP (from the verified PIP.json) references.promo (from pip.json)

Result: final encyclopedia.json with all references.

Files Used by the Website

  • doterra_products.json
  • encyclopedia.json

These two files power https://oil-calculator-dowellness.web.app/

Notes

  • The pipeline is repeatable and incremental.
  • Existing entries in encyclopedia.json are skipped unless you delete the file and re-run step 3.
  • merge_promo.py is the current merge script (replaces the old merge_PIP.py).
  • JSON is optimized for the website, not for reading.

About

his tool automates the extraction of product pricing and categorization from the official doTERRA Hong Kong price list.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages