Skip to content

DonutDataOrganization/dataGenix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

269 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Backend Documentation

Project Structure

app/     
    main.py                     
    api/         
        workspaces.py        # All workspace API endpoints         
        logs.py              # Log file API endpoints     
    core/         
        config.py            # App settings and configuration         
        openai.py            # OpenAI client setup     
    db/         
        dynamodb.py          # Database session management         
        redis.py             # Redis client     
    schemas/         
        workspaces.py        # Pydantic models for API     
    services/         
        analyzer.py          # Application analysis logic (OpenAI)         
        ingestion.py         # SQS and log ingestion services         
        log_parser.py        # Log parsing and field extraction         
        workspace.py         # Workspace business logic     
    utils/         
        logger.py            # Logging setup         
        search_text.py       # Searchable text utilities 

API Endpoints

Workspace Management

  • POST /api/workspace/ — Create a new workspace
  • GET /api/workspace/ — List all workspaces for the user
  • GET /api/workspace/{workspace_id} — Get details of a specific workspace
  • PUT /api/workspace/{workspace_id} — Update workspace name or description
  • DELETE /api/workspace/{workspace_id} — Delete a workspace and all its logs
  • GET /api/workspace/verified/{workspace_id} — Get workspace verification status

Log File Management

  • POST /api/log/upload/{workspace_id} — Upload and parse a log file for a workspace
  • GET /api/log/{workspace_id} — List all log files for a workspace
  • DELETE /api/log/{workspace_id}/{log_file_id} — Delete a specific log file from a workspace
  • POST /api/log/search/{workspace_id} — Search the logs for a workspace

Health & Root

  • GET / — API root, lists available endpoints
  • GET /healthcheck — Health check endpoint

AWS Setup

Create a Kinesis stream for log ingestion:

aws kinesis create-stream \
  --stream-name LogStream \
  --stream-mode ON_DEMAND \
  --region us-east-1

Once created, configure LogStreamQueue Lambda as the target of this Kinesis stream.

Clear All Workspaces from DynamoDB

aws dynamodb scan --table-name WORKSPACE_TABLE --projection-expression "pk, sk" --output json --query "Items" \
| jq -c '[.[] | {DeleteRequest: {Key: {pk: .pk, sk: .sk}}}] | {RequestItems: {WORKSPACE_TABLE: .}}' \
| split -l 1 - tmp_batch_

for file in tmp_batch_*; do
  cat "$file" | jq '{RequestItems: {WORKSPACE_TABLE: .WORKSPACE_TABLE[:25]}}' \
  | aws dynamodb batch-write-item --request-items file:///dev/stdin
done

rm tmp_batch_*

This will scan the WORKSPACE_TABLE, retrieve all workspace keys, and delete them one by one.


Workspace Lifecycle

Creation & Verification Flow

  1. Workspace Creation

    • User enters workspace details and selects type (manual or auto-stream).

    • Frontend sends details to the ECS backend.

      • Manual Workspace: Backend creates a DynamoDB entry.

      • Auto-Stream Workspace:

        • Backend configures a Kinesis ingestion policy for the provided AWS account and log group.
        • Workspace entry is created in DynamoDB.
        • Success response is returned to the frontend.
  2. CloudFormation Stack Deployment

    • User deploys a CloudFormation stack in their AWS account that:

      • Creates a CloudWatch Log Group for the service.
      • Attaches a Subscription Filter forwarding logs to the backend Kinesis stream.
      • Creates an IAM Role for the validation Lambda.
  3. Verification

    • User triggers Verify Setup in the frontend.

    • Backend invokes LogValidationLambda:

      • Assumes the IAM role in the user’s AWS account.
      • Inserts a validation log into the log group.
    • If setup is correct, the validation log flows into the backend ingestion pipeline.

  4. Validation Result

    • ECS consumes the validation log.

    • Routes it to ValidationFn, which updates the workspace status as verified in DynamoDB.

    • Frontend polls verification status:

      • If verified → user proceeds.
      • If not verified after 5 polls → user is prompted to redeploy the CloudFormation stack.

Diagrams

Workspace Creation & Verification – Sequence Diagram
Workspace Creation Sequence Diagram

Workspace Creation & Verification – Architecture Diagram
Workspace Creation Architecture Diagram


Log Ingestion Pipeline

End-to-End Flow

  1. Log Streaming

    • User AWS CloudWatch streams logs via a subscription filter to the backend Kinesis stream.
  2. Kinesis Processing

    • KinesisSQSLambda is the target for the Kinesis stream.

    • It:

      • Polls records from Kinesis.
      • Unzips CloudWatch log payloads.
      • Adds metadata (aws_id, log_group).
      • Pushes logs to SQS.
  3. SQS Consumption

    • ECS backend polls messages from SQS.

    • Splits logs by type:

      • Validation Log → routed to ValidationFn → updates DynamoDB workspace verification status.
      • Service Log → routed to ProcessLogsFn.
  4. Log Processing

    • ProcessLogsFn:

      • Sends raw log to ParseLogsFn for structured parsing.

      • Checks Redis cache for workspace_id using (aws_id, log_group).

        • If cache hit → use workspace_id.
        • If cache miss → query DynamoDB, then update Redis.
      • Stores processed logs under the workspace in DynamoDB.

Diagrams

Log Ingestion Pipeline – Sequence Diagram
Log Ingestion Sequence Diagram

Log Ingestion Pipeline – Architecture Diagram
Log Ingestion Architecture Diagram

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors