Data Case Study with Pandas - Famous Paintings

Overview

This project involves analysing the Famous Paintings dataset by uploading it to a PostgreSQL database using Python and solving SQL queries to sharpen SQL skills.

Create your environment

python3.11.1 -m venv my_env1    # <-- choose yoor python version and name of your environment

Open your VS Code integrated terminal and run:

pip install pandas
pip install sqlalchemy          # <-toolkit that simplifies the connection between Python and SQL databases
pip install psycopg2-binary     # <-database adapter for the Python

Download & unzip the dataset

In VS Code terminal run:

kaggle datasets download \
  -d mexwell/famous-paintings\
  -p data/amous-paintings \
  --unzip

-d specifies the dataset slug; -p sets the target directory; --unzip extracts the files in place

Load the data in Python: In your .py or Jupyter file, use pandas to read whichever CSV(s) you need.
For example (adjust the filename to match what was unzipped):

df = pd.read_csv('data/famous-paintings.csv')
print(df.head())

Conecting python to Postgresql database

conn_string = 'postgresql://user:password@localhost:0000/database'
engine = create_engine(conn_string)

db = create_engine(conn_string)
conn = db.connect()

try:
    conn = engine.connect()
    print(" Connection successful!")
except Exception as e:
    print(" Connection failed:", e)

from sqlalchemy import text

result = conn.execute(text("SELECT version();"))

for row in result:
    print(row)

Optional: Using .env for Security (Recommended):

pip install python-dotenv

Create a .env file:

DB_USER=
DB_PASS=
DB_NAME=
DB_HOST=
DB_PORT=

Update Python code:

import os
from dotenv import load_dotenv
from sqlalchemy import create_engine

load_dotenv()

conn_string = f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASS')}@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
engine = create_engine(conn_string)
conn = engine.connect()

Creating a loop

files = ['artist', 'canvas_size', 'image_link', 'museum_hours', 'museum', 'product_size', 'subject', 'work']

for file in files:
    df = pd.read_csv(f'/Users/user/Desktop/art/{file}.csv')
    df.to_sql(file, con=engine, schema='public', if_exists='replace', index=False)
    print(f" Uploaded: {file}")

Case Study in the 'art' Database PostgreSQL

Problem Statements:

List all the paintings which are not displayed on any museums?
Are there museums without any paintings?
How many paintings have an asking price of more than their regular price?
Identify the paintings whose asking price is less than 50% of their regular price?
Which canva size costs the most?
Which canvas is largest by physical size (width)?
Identify the museums with invalid city information in the given dataset.
Museum_Hours table has 1 invalid city information. Identify and remove it.
Fetch the top 10 most famous painting subjects.
Identify the museums which are open on both Sunday and Monday. Display museum name & city.
How many museums open every single day.
Which is the top 5 most popular museums? (Popularity is defined based on the most no of paintings in a museum).
Who are the top 5 most popular artists? (Popularity is defined based on the most no of paintings done by an aartist).
Display the 3 least popular paintings.
Which museum is open for the longest during the day?
Which museum has the most No of most popular painting style?
Identify the artists whose paintings are displayed in multiple countries

Acknowledgments

Data Source: Kaggle’s Famous Art
Inspiration: @techtfq

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
art_project.ipynb		art_project.ipynb
art_ptoject _part1.sql		art_ptoject _part1.sql
artist.csv		artist.csv
canvas_size.csv		canvas_size.csv
image_link.csv		image_link.csv
museum.csv		museum.csv
museum_hours.csv		museum_hours.csv
product_size.csv		product_size.csv
requirements.txt		requirements.txt
subject.csv		subject.csv
work.csv		work.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Case Study with Pandas - Famous Paintings

Overview

Create your environment

Conecting python to Postgresql database

Optional: Using .env for Security (Recommended):

Creating a loop

Case Study in the 'art' Database PostgreSQL

Problem Statements:

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Case Study with Pandas - Famous Paintings

Overview

Create your environment

Conecting python to Postgresql database

Optional: Using .env for Security (Recommended):

Creating a loop

Case Study in the 'art' Database PostgreSQL

Problem Statements:

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages