This project involves analysing the Famous Paintings dataset by uploading it to a PostgreSQL database using Python and solving SQL queries to sharpen SQL skills.
python3.11.1 -m venv my_env1 # <-- choose yoor python version and name of your environment- Open your VS Code integrated terminal and run:
pip install pandas
pip install sqlalchemy # <-toolkit that simplifies the connection between Python and SQL databases
pip install psycopg2-binary # <-database adapter for the Python- Download & unzip the dataset
- In VS Code terminal run:
kaggle datasets download \
-d mexwell/famous-paintings\
-p data/amous-paintings \
--unzip-d specifies the dataset slug; -p sets the target directory; --unzip extracts the files in place
-
Load the data in Python: In your .py or Jupyter file, use pandas to read whichever CSV(s) you need.
-
For example (adjust the filename to match what was unzipped):
df = pd.read_csv('data/famous-paintings.csv')
print(df.head())conn_string = 'postgresql://user:password@localhost:0000/database'
engine = create_engine(conn_string)
db = create_engine(conn_string)
conn = db.connect()
try:
conn = engine.connect()
print(" Connection successful!")
except Exception as e:
print(" Connection failed:", e)
from sqlalchemy import text
result = conn.execute(text("SELECT version();"))
for row in result:
print(row)pip install python-dotenv- Create a .env file:
DB_USER=
DB_PASS=
DB_NAME=
DB_HOST=
DB_PORT=- Update Python code:
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
load_dotenv()
conn_string = f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASS')}@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
engine = create_engine(conn_string)
conn = engine.connect()files = ['artist', 'canvas_size', 'image_link', 'museum_hours', 'museum', 'product_size', 'subject', 'work']
for file in files:
df = pd.read_csv(f'/Users/user/Desktop/art/{file}.csv')
df.to_sql(file, con=engine, schema='public', if_exists='replace', index=False)
print(f" Uploaded: {file}")- List all the paintings which are not displayed on any museums?
- Are there museums without any paintings?
- How many paintings have an asking price of more than their regular price?
- Identify the paintings whose asking price is less than 50% of their regular price?
- Which canva size costs the most?
- Which canvas is largest by physical size (width)?
- Identify the museums with invalid city information in the given dataset.
- Museum_Hours table has 1 invalid city information. Identify and remove it.
- Fetch the top 10 most famous painting subjects.
- Identify the museums which are open on both Sunday and Monday. Display museum name & city.
- How many museums open every single day.
- Which is the top 5 most popular museums? (Popularity is defined based on the most no of paintings in a museum).
- Who are the top 5 most popular artists? (Popularity is defined based on the most no of paintings done by an aartist).
- Display the 3 least popular paintings.
- Which museum is open for the longest during the day?
- Which museum has the most No of most popular painting style?
- Identify the artists whose paintings are displayed in multiple countries
- Data Source: Kaggle’s Famous Art
- Inspiration: @techtfq