Skip to content

IWDAS/GDPR-obfuscation

Repository files navigation

GDPR-obfuscation

✔ Simple and Lightweight: No unnecessary dependencies. ✔ AWS-Compatible: Works seamlessly with AWS services.

A lightweight, no-fuss library for obfuscating specified fields in CSV, JSON and PArquet files stored in an S3 bucket. Designed for users who need a simple yet effective solution for masking Personally Identifiable Information (PII), tested for AWS Lambda.

🚀 Prerequisites

AWS Credentials: Ensure credentials are available via the AWS CLI or environment variables.

Boto3: Assumes the AWS SDK for Python (Boto3) is already included in your project.

📦 Installation

Ensure your virtual environment (venv) is activated.

Download the latest package wheel.

Install the package using:

pip install <path/to/the/package/wheel>

📖 Usage

🛠 Importing the Function

from GDPR_obfuscation.main import obfuscate_file

⚡ Invoking the Function

Pass an existing or newly created S3 client along with a payload specifying the file and fields to obfuscate.

🔹 Example:

import boto3 from GDPR_obfuscation import obfuscate_file

Create an S3 client

s3_client = boto3.client('s3')

Define the payload with the file location and PII fields to obfuscate

payload = { "file_to_obfuscate": "s3://my_ingestion_bucket/new_data/file1.csv", "pii_fields": ["name", "email_address"] }

Perform obfuscation

masked_data = obfuscate_file(s3_client, payload)

🔍 How It Works

The function checks the file type and then reads the file from the specified S3 location. Currently supported file types are CSV, JSON and Parquet.

It obfuscates the values of the specified PII fields with ***.

The obfuscated data is returned as an in memory object, suitable for operations such as S3 putObject.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages