Express backend api for Instagram scraping via Apify, post classification, and insight generation.
View the app here
View the frontend-repo here
This repo is configured for Docker-based deployment.
- Node.js 18+
- Environment variable
OPENROUTER_API_KEY - Environment variable
APIFY_TOKEN
Optional environment variables:
PORT(default:3000)OPENROUTER_MODEL(default:google/gemma-4-26b-a4b-it)DEFAULT_ACCOUNTS(comma-separated handles, default:plaeto.schools)DEFAULT_MAX_POSTS(default:2)APIFY_INSTAGRAM_ACTOR(default:apify/instagram-post-scraper)
npm install
npm startThe process binds to PORT, which your platform should set automatically.
Use the included Dockerfile as the runtime source.
docker build -t eidos-backend .
docker run -p 3000:3000 --env-file .env eidos-backendImportant:
- Runtime command must be
npm start(ornode index.js). - Do not use
node test_analyze.jsas the service start command; it is only a one-off client test script.
Service info and route list.
Basic liveness response.
Returns supported intent and format categories.
Classifies a single caption with optional image context.
Request body:
{
"caption": "A sample Instagram caption",
"imageUrl": "https://example.com/image.jpg",
"categories": {
"intent": ["Promotional", "Educational"],
"format": ["Trend", "Tutorial"]
}
}Notes:
categoriesis optional. If not provided, the default categories are used.
Response body:
{
"classification": {
"intent": "Promotional",
"format": "Trend"
},
"rawResponse": "{\n \"intent\": \"Promotional\",\n \"format\": \"Trend\"\n}"
}The service uses multiple Apify actors to fetch Instagram account details and posts:
coderx/instagram-profile-scraper-bio-poststo fetch the account's followers count.- The user-defined
APIFY_INSTAGRAM_ACTOR(defaults toapify/instagram-post-scraper) to fetch Instagram posts.
Request sent to the coderx/instagram-profile-scraper-bio-posts actor:
{
"usernames": ["plaeto.schools"]
}The scraper returns structural profile data, from which we extract the followersCount field mappings.
For each account, the following request is sent to the posts scraper actor:
{
"dataDetailLevel": "basicData",
"resultsLimit": 5,
"skipPinnedPosts": false,
"username": ["plaeto.schools"]
}dataDetailLevel: Set tobasicDatafor standard post detailsresultsLimit: Number of posts to retrieve (passed frommaxPostsparameter)skipPinnedPosts: Whether to skip pinned postsusername: Array of Instagram handles to scrape
The actor returns an array of post objects with the following structure:
[
{
"inputUrl": "https://www.instagram.com/p/DLNsnpUTdVS/",
"id": "3660778310592222546",
"type": "Image",
"shortCode": "DLNsnpUTdVS",
"caption": "Your phone isn't rotting your brain...",
"hashtags": [],
"mentions": [],
"url": "https://www.instagram.com/p/DLNsnpUTdVS/",
"commentsCount": 230,
"firstComment": "Amen.",
"latestComments": [...],
"dimensionsHeight": 1350,
"dimensionsWidth": 1080,
"displayUrl": "https://scontent-dfw5-3.cdninstagram.com/v/t51.2885-15/...",
"images": [],
"alt": "Photo by National Geographic...",
"likesCount": 73473,
"timestamp": "2025-06-22T19:00:10.000Z",
"childPosts": [],
"ownerFullName": "National Geographic",
"ownerUsername": "natgeo",
"ownerId": "787132",
"isCommentsDisabled": false
}
]Key fields extracted and normalized:
url/inputUrl→link: Post URLdisplayUrl/images[0]→img: Cover imagetype/productType→type: Normalized topostorreellikesCount→likes: Like countcommentsCount→comments: Comment countcaption→caption: Post caption texttimestamp→date: ISO 8601 date
Runs end-to-end scrape + classify + analytics.
Request body:
{
"accounts": ["plaeto.schools", "another.brand"],
"maxPosts": 3,
"includeAiOverview": true,
"generateExcel": true,
"categories": {
"intent": ["Promotional", "Educational"],
"format": ["Trend", "Tutorial"]
}
}Notes:
accountsis optional; falls back toDEFAULT_ACCOUNTS.maxPostsmust be between 1 and 25.- If
maxPostsis higher than the number of available posts for an account, the service returns all available posts without failing. categoriesis optional; falls back to default categories if not provided.- One analysis run is allowed at a time.
Additional behavior:
- Before extracting posts, the service sends the provided (or default) categories to the LLM (OpenRouter) and asks it to define each category in one short line. These definitions are then passed into the classifier when labeling posts to give the model clearer, consistent criteria.
- The generated definitions are included in the final analysis payload under the field
categoryDefenitions(note the spelling used by the service). ThecategoryDefenitionsobject has the shape{ intent: { ... }, format: { ... } }and appears immediately beforerawDatain the response.
POST /api/analyze supports Server-Sent Events (SSE) progress streaming.
Enable streaming in either way:
- Add
"stream": truein request JSON body. - Or send header
Accept: text/event-stream.
When streaming is enabled, the response is SSE (not a single JSON response). The API sends progress events during execution, then a final event with the full analysis output.
Each progress update is sent as:
event: progress
data: { ... }
Progress payload examples:
- While extracting posts via Apify:
{
"stage": "extracting_posts",
"message": "Extracting posts...",
"account": "plaeto.schools"
}New progress stage:
{
"stage": "fetching_category_definitions",
"message": "Fetching category definitions..."
}- While analyzing individual posts:
{
"stage": "analyzing_post",
"message": "plaeto.schools | post 1 | https://www.instagram.com/p/ABC123/",
"account": "plaeto.schools",
"postNumber": 1,
"link": "https://www.instagram.com/p/ABC123/"
}- While generating analytics from collected posts:
{
"stage": "analyzing_data",
"message": "analysing data"
}At completion, the API streams:
event: final
data: { ...full analyze payload... }
event: done
data: { "message": "analysis complete" }
The final event contains the same structure as the non-streaming JSON response (fields like runId, createdAt, accounts, maxPosts, rawData, analysis, aiOverview, excelPath, errors).
curl -N -X POST http://localhost:8080/api/analyze \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"accounts": ["plaeto.schools"],
"maxPosts": 2,
"includeAiOverview": false,
"generateExcel": false,
"stream": true
}'Response body:
{
"runId": "1713600000000",
"createdAt": "2026-04-20T12:00:00.000Z",
"accounts": [
"plaeto.schools",
"another.brand"
],
"maxPosts": 3,
"categoryDefenitions": {
"intent": {
"Promotional": "Content aiming to sell or promote a product or service",
"Educational": "Content intended to teach or inform"
},
"format": {
"Trend": "Content following a current trend",
"Tutorial": "Instructional content showing how to do something"
}
},
"rawData": {
"plaeto.schools": [
{
"link": "https://www.instagram.com/p/...",
"img": "https://...",
"type": "post",
"likes": 1500,
"comments": 45,
"caption": "Example caption...",
"date": "2026-04-18T10:00:00.000Z",
"intent": "Educational",
"format": "Tutorial"
}
],
"another.brand": []
},
"analysis": {
"global_insights": {
"intent_insights": {
"Educational": {
"global_relative_performance_average": {
"likes": "10.50%",
"comments": "5.00%"
},
"global_relative_performance_median": {
"likes": "8.00%",
"comments": "2.50%"
},
"account_relative_win_rate": {
"likes": "50.00%",
"comments": "25.00%"
}
}
},
"format_insights": {
"Tutorial": {
"global_relative_performance_average": {
"likes": "15.00%",
"comments": "N/A"
},
"global_relative_performance_median": {
"likes": "12.00%",
"comments": "N/A"
},
"account_relative_win_rate": {
"likes": "100.00%",
"comments": "0.00%"
}
}
}
},
"additional_insights": {
"topPerformer": {
"account": "plaeto.schools",
"frequency": "2 days"
},
"reelsPerformanceOverPosts": "15.20%",
"timeOfDayEngagement": {
"10:00 to 12:00": {
"avgLikes": 1500,
"avgComments": 45
}
}
},
"account_analysis": {
"plaeto.schools": {
"followersCount": 45000,
"averageLikesComments": {
"avgLikes": 1500,
"avgComments": 45
},
"totalPosts": 3,
"intentDistribution": {
"Educational": {
"no_of_posts": 1,
"category_total_likes": 1500,
"category_total_comments": 45,
"category_avg_likes": 1500,
"category_avg_comments": 45,
"relative_performance": {
"likes": "0.00%",
"comments": "0.00%"
}
}
},
"formatDistribution": {
"Tutorial": {
"no_of_posts": 1,
"category_total_likes": 1500,
"category_total_comments": 45,
"category_avg_likes": 1500,
"category_avg_comments": 45,
"relative_performance": {
"likes": "0.00%",
"comments": "0.00%"
}
}
},
"averageTimeBetweenPostsReadable": "2 days"
}
}
},
"aiOverview": null,
"excelPath": ".../outputs/global_insights_1713600000000.xlsx",
"errors": []
}Returns the latest completed analysis payload.
Downloads the latest generated Excel file (if generateExcel was true).
- Build method: Dockerfile
- Runtime command (inside container):
npm start - Container port:
8080