Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
DB_HOST=35.199.115.174
DB_PORT=3306
DB_NAME=looqbox-challenge
DB_USER=looqbox-challenge
DB_PASSWORD=
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__pycache__/
*.py[cod]
.env
.venv/
venv/
32 changes: 32 additions & 0 deletions README_SOLUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Looqbox Data Challenge - Solução

Esta solução contém as respostas em SQL, código Python reutilizável e artefatos gerados para o desafio técnico da Looqbox.

## Arquivos

- `sql/answers.sql`: consultas SQL das três perguntas.
- `src/db.py`: configuração do banco e criação da engine SQLAlchemy.
- `src/data_access.py`: função reutilizável `retrieve_data(product_code, store_code, date)`.
- `src/generate_outputs.py`: executa as respostas SQL, a transformação do caso 2, o gráfico IMDB e a geração do PDF.
- `output/`: CSVs gerados, gráfico PNG e PDF final.

## Como executar

Crie um arquivo `.env` a partir de `.env.example` e preencha `DB_PASSWORD`.

```powershell
py -m pip install -r requirements.txt
py src\generate_outputs.py
```

O schema do banco se chama `looqbox-challenge`, com hífen. O código Python conecta diretamente nesse schema; os arquivos SQL usam crase quando necessário.

## Observações

- O caso 1 usa SQL parametrizado e valida as datas antes da consulta.
- O caso 2 mantém as duas consultas do cliente inalteradas e aplica o filtro de datas solicitado no pandas.
- O caso 3 expande os gêneros da tabela IMDB separados por vírgula e compara os principais gêneros por receita média.

Durante a resolução do desafio, utilizei apoio pontual de ferramenta de IA/LLM para revisão de trechos específicos, organização textual da documentação e validação de dúvidas pontuais durante o desenvolvimento.

A construção da lógica, execução dos testes, interpretação dos resultados e estruturação principal da solução foram realizadas por mim, com base no meu conhecimento em Python, SQL e análise de dados.
6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pandas>=3.0.0
PyMySQL>=1.1.0
SQLAlchemy>=2.0.0
matplotlib>=3.8.0
reportlab>=4.0.0
python-dotenv>=1.0.0
34 changes: 34 additions & 0 deletions sql/answers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
USE `looqbox-challenge`;

-- 1) What are the 10 most expensive products in the company?
SELECT
PRODUCT_COD,
PRODUCT_NAME,
PRODUCT_VAL,
DEP_NAME,
DEP_COD,
SECTION_NAME,
SECTION_COD
FROM data_product
ORDER BY PRODUCT_VAL DESC
LIMIT 10;

-- 2) What sections do the 'BEBIDAS' and 'PADARIA' departments have?
SELECT DISTINCT
DEP_NAME,
SECTION_COD,
SECTION_NAME
FROM data_product
WHERE DEP_NAME IN ('BEBIDAS', 'PADARIA')
ORDER BY DEP_NAME, SECTION_NAME;

-- 3) What was the total sale of products (in $) of each Business Area in the first quarter of 2019?
SELECT
sc.BUSINESS_NAME,
ROUND(SUM(ps.SALES_VALUE), 2) AS TOTAL_SALES_VALUE
FROM data_product_sales AS ps
INNER JOIN data_store_cad AS sc
ON CAST(ps.STORE_CODE AS UNSIGNED) = sc.STORE_CODE
WHERE ps.DATE BETWEEN '2019-01-01' AND '2019-03-31'
GROUP BY sc.BUSINESS_NAME
ORDER BY TOTAL_SALES_VALUE DESC;
65 changes: 65 additions & 0 deletions src/data_access.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
from __future__ import annotations

from datetime import date as Date
from typing import Sequence

import pandas as pd
from sqlalchemy import Engine, text

from db import get_engine


def _parse_iso_date(value: str) -> Date:
try:
return Date.fromisoformat(value)
except ValueError as exc:
raise ValueError(f"Invalid date '{value}'. Expected ISO format YYYY-MM-DD.") from exc


def retrieve_data(
product_code: int | None = None,
store_code: int | str | None = None,
date: Sequence[str] | None = None,
engine: Engine | None = None,
) -> pd.DataFrame:
"""Retrieve rows from data_product_sales using optional, parameterized filters.

Parameters are optional to keep the function flexible for other teams. When
`date` is provided, it must be a two-item interval: [start_date, end_date].
"""
filters: list[str] = []
params: dict[str, object] = {}

if product_code is not None:
if not isinstance(product_code, int):
raise TypeError("product_code must be an integer.")
filters.append("PRODUCT_CODE = :product_code")
params["product_code"] = product_code

if store_code is not None:
filters.append("STORE_CODE = :store_code")
params["store_code"] = str(store_code)

if date is not None:
if len(date) != 2:
raise ValueError("date must contain exactly two values: [start_date, end_date].")
start_date = _parse_iso_date(date[0])
end_date = _parse_iso_date(date[1])
if start_date > end_date:
raise ValueError("start_date cannot be greater than end_date.")
filters.append("DATE BETWEEN :start_date AND :end_date")
params["start_date"] = start_date
params["end_date"] = end_date

query = "SELECT * FROM data_product_sales"
if filters:
query += " WHERE " + " AND ".join(filters)
query += " ORDER BY DATE, STORE_CODE, PRODUCT_CODE"

owns_engine = engine is None
engine = engine or get_engine()
try:
return pd.read_sql_query(text(query), engine, params=params)
finally:
if owns_engine:
engine.dispose()
43 changes: 43 additions & 0 deletions src/db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from __future__ import annotations

import os
from dataclasses import dataclass

from dotenv import load_dotenv
from sqlalchemy import Engine, create_engine
from sqlalchemy.engine import URL


load_dotenv()


@dataclass(frozen=True)
class DatabaseConfig:
host: str = os.getenv("DB_HOST", "35.199.115.174")
port: int = int(os.getenv("DB_PORT", "3306"))
database: str = os.getenv("DB_NAME", "looqbox-challenge")
user: str = os.getenv("DB_USER", "looqbox-challenge")
password: str | None = os.getenv("DB_PASSWORD")

@classmethod
def from_env(cls) -> "DatabaseConfig":
config = cls()
if not config.password:
raise RuntimeError(
"DB_PASSWORD is required. Copy .env.example to .env and fill the password, "
"or export DB_PASSWORD before running the scripts."
)
return config


def get_engine(config: DatabaseConfig | None = None) -> Engine:
config = config or DatabaseConfig.from_env()
url = URL.create(
"mysql+pymysql",
username=config.user,
password=config.password,
host=config.host,
port=config.port,
database=config.database,
)
return create_engine(url, pool_pre_ping=True)
Loading