✔ Simple and Lightweight: No unnecessary dependencies. ✔ AWS-Compatible: Works seamlessly with AWS services.
A lightweight, no-fuss library for obfuscating specified fields in CSV, JSON and PArquet files stored in an S3 bucket. Designed for users who need a simple yet effective solution for masking Personally Identifiable Information (PII), tested for AWS Lambda.
🚀 Prerequisites
AWS Credentials: Ensure credentials are available via the AWS CLI or environment variables.
Boto3: Assumes the AWS SDK for Python (Boto3) is already included in your project.
📦 Installation
Ensure your virtual environment (venv) is activated.
Download the latest package wheel.
Install the package using:
pip install <path/to/the/package/wheel>
📖 Usage
🛠 Importing the Function
from GDPR_obfuscation.main import obfuscate_file
⚡ Invoking the Function
Pass an existing or newly created S3 client along with a payload specifying the file and fields to obfuscate.
🔹 Example:
import boto3 from GDPR_obfuscation import obfuscate_file
s3_client = boto3.client('s3')
payload = { "file_to_obfuscate": "s3://my_ingestion_bucket/new_data/file1.csv", "pii_fields": ["name", "email_address"] }
masked_data = obfuscate_file(s3_client, payload)
🔍 How It Works
The function checks the file type and then reads the file from the specified S3 location. Currently supported file types are CSV, JSON and Parquet.
It obfuscates the values of the specified PII fields with ***.
The obfuscated data is returned as an in memory object, suitable for operations such as S3 putObject.