Skip to content

Normalize and persist articles #6

@justinmadison

Description

@justinmadison

Summary

Read JSON files, format, and store them in a database

Motivation

JustInsight will need to have articles formated and stored to facilitate processing

Scope

None

Acceptance Criteria

-[ ] test that loads article and verifies the schema

Additional Context

  1. Create /etl/normalize.py to:
  • Read raw JSON files from ./data/raw/
  • Extract and clean fields: title, body, timestamp, source
  • Insert into MongoDB collection articles
  1. Add connection config via environment variables
  2. Write a smoke test that loads one article and verifies the schema

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions