This work has been accepted to 26th ACM Internet Measurement Conference (IMC) 2026.
@article{ghani2026pixelconfig,
title={PixelConfig: Longitudinal Measurement and Reverse-Engineering of Meta Pixel Configurations},
author={Ghani, Abdullah and Vekaria, Yash and Shafiq, Zubair},
journal={arXiv preprint arXiv:2603.09380},
year={2026}
}
Tracking pixels are widely used to optimize online ad campaigns through personalization, re-targeting, and conversion tracking. While prior research has primarily focused on detecting the prevalence of tracking pixels, limited attention has been given to variations in their configurations across websites. A tracking pixel may be configured differently on different websites.
This project proposes a differential analysis framework to reverse-engineer tracking pixel configurations. Using this framework, we investigate three types of Meta Pixel configurations:
- Activity tracking: What a user is doing on a website.
- Identity tracking: Who a user is or what device they are associated with.
- Tracking restrictions: Mechanisms to limit the sharing of potentially sensitive information.
Using data from the Internet Archive’s Wayback Machine, we analyze and compare Meta Pixel configurations on approximately 18,000 health-related websites versus a control group of the top 10,000 websites from 2017 to 2024. This repository contains the scripts, core processed data, and analysis notebooks to reproduce our findings.
.
├── Analysis/
│ ├── Plots/ # Directory intended for plots generated by analysis notebooks.
│ ├── comparisonPlots.ipynb # Jupyter notebook for generating feature adoption comparison plots.
│ ├── hashes.txt # Text file with SHA-256 hashes and their decrypted values.
│ ├── key_categorizations.csv # CSV for categorizing keys (e.g., PII, health-related).
│ ├── pixelPresence.ipynb # Notebook for analyzing Meta Pixel presence.
│ ├── processSnapshots.py # Script to parse/process raw Pixel configuration snapshots.
│ └── unwantedDataAnalysis.ipynb # Notebook for analyzing blacklisted/sensitive keys.
├── Configurations/ # Pre-processed Meta Pixel configuration data.
│ ├── health-configurations.csv
│ └── top-10k-configurations.csv
├── Data Collection/ # Scripts for main data collection (snapshots, Pixel IDs, configurations).
│ ├── extractPixelIDs.py
│ ├── fetchConfigurationSnapshots.py
│ └── fetchWebsiteSnapshots.py
├── Pixel History/ # Pre-processed data on Pixel ID presence over time.
│ ├── final_pixel_history_health.csv
│ └── final_pixel_history_top10k.csv
├── Websites/ # Contains final website lists and health curation scripts/data.
│ ├── health_websites.csv # Final curated list of health websites.
│ ├── tranco_top_10k.csv # Input list of Top 10k websites.
│ └── health-websites/ # Scripts and intermediate files for generating health_websites.csv.
│ ├── google-search-health-organizations.py
│ ├── input-cms-hospitals.csv
│ ├── openai-map-health-organizations.py
│ └── output-cms-hospitals.csv
└── requirements.txt # Python package requirements.
This repository includes pre-processed data files that are the direct output of our data collection and initial processing scripts. These files can be used to reproduce the analyses presented in the paper or for further investigation.
The primary data files are:
top-10k-configurations.csvhealth-configurations.csvfinal_pixel_history_top10k.csvfinal_pixel_history_health.csv
Below is a detailed description of their structure and content.
- Files:
top-10k-configurations.csv: Processed Meta Pixel configuration data for the Top 10k website cohort.health-configurations.csv: Processed Meta Pixel configuration data for the Health website cohort.
- Source: These files are the primary output of the
processSnapshots.pyscript (after being saved to CSV). Each row represents a unique Meta Pixel configuration script snapshot found on a website at a specific time. - Columns:
plugin_name: (List of strings) A list of plugin names loaded by the Pixel configuration (e.g.,['unwanteddata', 'inferredevents']). Normalized to lowercase without underscores.opt_in_info: (List of tuples) Information aboutinstance.optIncalls. Each tuple is('config_name', 'enabled_status_as_string'), e.g.,[('automaticsetup', 'true'), ('firstpartycookies', 'true')]. Config names are normalized.config_set_info: (List of tuples) Information fromconfig.setcalls. Each tuple is('config_name', dictionary_of_config_data), e.g.,[('automaticmatching', {'selectedMatchKeys': ['em', 'ph']})]. Config names are normalized. The second element is a Python dictionary parsed from the original JSON.fbq_set_info: (List of tuples) Information fromfbq.setcalls. Each tuple is('config_name', list_of_config_data), e.g.,[('estrules', [{'condition': {...}, 'derived_event_name': '...'}])]. Config names are normalized. The second element is a Python list parsed from the original JSON array.timestamp: (String, ISO 8601 format or similar) The timestamp of the Wayback Machine snapshot for this configuration script (e.g.,YYYY-MM-DD HH:MM:SS).website: (String) The domain name of the website where this Pixel configuration was found.pixel_id: (String) The Meta Pixel ID associated with this configuration.year: (Integer) The year extracted from thetimestamp.
- Notes on Data Types when reading CSV:
- Columns like
plugin_name,opt_in_info,config_set_info, andfbq_set_infostore complex data structures (lists, lists of tuples, where tuples can contain dictionaries or lists). When read from a CSV, these will typically be strings. You will need to parse them back into their Python object forms (e.g., usingast.literal_evalcarefully) for analysis.
- Columns like
- Files:
final_pixel_history_top10k.csv: Historical and live Meta Pixel ID presence for the Top 10k website cohort.final_pixel_history_health.csv: Historical and live Meta Pixel ID presence for the Health website cohort.
- Source: These files are the primary output of the
extractPixelIDs.pyscript (specifically, thepixelHistoryComplete.csvfile, renamed per cohort). Each row represents a unique website. - Columns:
website: (String) The domain name of the website.- Monthly Columns (e.g.,
202502,202501, ...,201701):- A series of columns, one for each month from January 2017 up to the latest month covered by the data collection (e.g., February 2025).
- The value in each monthly cell represents the Meta Pixel IDs found on that
websiteduring that specific month based on Wayback Machine snapshots. - Data Format: The cell value is a string representation of a Python list of strings (Pixel IDs), e.g.,
['239130639218562']. An empty cell or a string like[](ornanif read by pandas and then saved) indicates no Pixel IDs were found or no snapshot was available/processed for that month.
live:- The value in this cell represents the Meta Pixel IDs found on that
websiteduring the live crawl performed byextractPixelIDs.py. - Data Format: Similar to monthly columns, it's a string representation of a Python list of strings (Pixel IDs), e.g.,
['12345', '67890'].
- The value in this cell represents the Meta Pixel IDs found on that
Unnamed: 0: (Integer) An index column, likely added by pandas when saving the CSV withoutindex=False. Can typically be ignored or dropped.
- Example Row Structure (for
pixelHistoryfiles):website,202403,202402,202401,...,live example.com,[],['123456789'],['123456789'],...,['123456789','987654321'] hiramhealthandrehab.com,,['239130639218562'],,,...,[] - Notes on Data Types when reading CSV:
- The monthly columns and the
livecolumn store string representations of lists. These need to be parsed (e.g., usingast.literal_eval) into actual Python lists of strings for analysis. Be cautious with empty strings orNaNvalues.
- The monthly columns and the
This section outlines the methodology and scripts used to collect the data for this research, including historical website snapshots, Meta Pixel IDs, and Meta Pixel configuration script snapshots. The data collection process is divided into three main steps, executed sequentially.
Before you begin the data collection process, please ensure you have the following:
-
Python 3.11: The scripts are written in Python 3.
-
Required Python Packages: Install all necessary packages by running:
pip install -r requirements.txt
-
Google Chrome: A recent version of Google Chrome browser must be installed. The scripts use
webdriver-managerto automatically download the appropriate ChromeDriver. -
Input Website Lists:
tranco_top_10k.csv: A CSV file containing a list of top 10,000 websites. This file must have a header row with a column namedwebsitelisting the domain names (e.g.,example.com).health_websites.csv: A CSV file containing a list of health-related websites. This file must also have a header row with a column namedwebsite.
Place these CSV files in the same directory as the scripts or update the paths within the scripts accordingly.
- Script:
fetchWebsiteSnapshots.py - Purpose: To download historical HTML snapshots of websites from the Internet Archive's Wayback Machine. The script targets bi-annual snapshots (January and July, or closest available) for each website, starting from 2017.
- Input:
- A CSV file listing websites (e.g.,
tranco_top_10k.csvorhealth_websites.csv).
- A CSV file listing websites (e.g.,
- Key Operations:
- Reads the list of websites from the specified CSV.
- For each website, queries the Wayback Machine CDX Server API for available snapshots between 2017 and the current date.
- Filters these snapshots to select approximately two per year (one for the first half, one for the second half).
- Uses Selenium with a headless Chrome browser to navigate to each selected Wayback Machine URL.
- Waits for 30 seconds after the page loads to allow for dynamic content rendering (including potential tracking pixels).
- Saves the full HTML source of the page.
- Tracks progress in a separate CSV file, allowing the script to be stopped and resumed.
- Configuration (within
fetchWebsiteSnapshots.py):WEBSITES_FILE: (Line ~165) Set this variable to the path of your input CSV file.- Example for Tranco Top 10k:
WEBSITES_FILE = 'tranco_top_10k.csv' - Example for health Websites:
WEBSITES_FILE = 'health_websites.csv'
- Example for Tranco Top 10k:
mount_path: (Line ~91) Defines the base output directory. The script creates subdirectories for each website.- Important: If processing
tranco_top_10k.csv, you should modify this path to, for example:mount_path = f".top10k-snapshots/"to keep outputs separate.
- Important: If processing
progress_file: (Line ~166) Name of the CSV file for tracking progress (e.g.,"final-progress.csv"). It's advisable to use different progress file names if running for different website lists (e.g.,top10k-progress.csv,health-progress.csv).
- How to Run:
- Modify the
WEBSITES_FILE,mount_path(if necessary), andprogress_filevariables withinfetchWebsiteSnapshots.py. - Execute the script from your terminal:
python fetchWebsiteSnapshots.py
- Modify the
- Expected Output:
- A directory structure containing HTML snapshots:
- Example:
./final-snapshots/cms-snapshots/healthdomain.com/20200101123456.html - Example:
./final-snapshots/top10k-snapshots/example.com/20200701000000.html
- Example:
- A progress CSV file (e.g.,
final-progress.csv) logging successfully processed websites.
- A directory structure containing HTML snapshots:
- Script:
extractPixelIDs.py - Purpose: To extract Meta Pixel IDs from the historical HTML snapshots (collected in Step 1) and from a fresh, live crawl of the websites. It then consolidates this information.
- Input:
- The directory containing historical website snapshots (output from Step 1).
- A CSV file listing websites for the live crawl (e.g.,
tranco_top_10k.csvorhealth_websites.csv).
- Key Operations:
- Wayback Snapshots Processing:
- Scans the directory of historical snapshots.
- For each HTML file, uses regular expressions to find Pixel IDs (patterns:
<script src=".../config/PIXEL_ID">andfbq("init", "PIXEL_ID");). - Outputs a
pixelHistory.csvfile mapping websites to Pixel IDs found per month.
- Live Website Crawling:
- Crawls the live version of websites listed in the specified CSV.
- Saves the HTML of the live pages.
- Extracts Pixel IDs from these live pages using the same regex patterns.
- Outputs results to
pixelHistoryLive.csv. - Logs progress in
downloaded_websites.txt.
- Merging:
- Combines the data from
pixelHistory.csv(Wayback) andpixelHistoryLive.csv(live) into a singlepixelHistoryComplete.csv.
- Combines the data from
- Wayback Snapshots Processing:
- Configuration (within
extractPixelIDs.py):BASE_FOLDER: (Line ~146) Path to the directory containing the historical snapshots from Step 1.- Example for Tranco Top 10k snapshots:
BASE_FOLDER = "./final-snapshots/top10k-snapshots" - Example for health snapshots:
BASE_FOLDER = "./final-snapshots/cms-snapshots"
- Example for Tranco Top 10k snapshots:
WEBSITES_PATH: (Line ~148) Path to the input CSV file for the live crawl.- Example:
WEBSITES_PATH = 'tranco_top_10k.csv'
- Example:
live_folder_path: (Line ~144) Directory to save HTML of live crawled websites (e.g.,'live_websites'). Consider using distinct names if running for different lists (e.g.,live_websites_top10k,live_websites_healths).- Output CSV files are named:
pixelHistory.csv,pixelHistoryLive.csv,pixelHistoryComplete.csv. If running for different datasets, you might need to rename these outputs after each run to avoid overwriting.
- How to Run:
- Ensure the
BASE_FOLDERpoints to the correct output from Step 1. - Set
WEBSITES_PATHto the desired list for live crawling. - Modify
live_folder_pathif desired. - Execute the script:
python extractPixelIDs.py
- Ensure the
- Expected Output:
pixelHistory.csv: Historical Pixel ID data.live_websites/(or yourlive_folder_path): HTML files from live crawls.pixelHistoryLive.csv: Live Pixel ID data.downloaded_websites.txt: Log for live crawl.pixelHistoryComplete.csv: Merged historical and live Pixel ID data.
- Script:
fetchConfigurationSnapshots.py - Purpose: To download historical snapshots of Meta Pixel configuration scripts from the Wayback Machine, using the Pixel IDs identified in Step 2.
- Input:
- The
pixelHistoryComplete.csvfile generated in Step 2.
- The
- Key Operations:
- Reads
pixelHistoryComplete.csv. - For each website and each unique Pixel ID associated with it:
- Constructs the URL for the Meta Pixel configuration script (e.g.,
https://connect.facebook.net/signals/config/PIXEL_ID). - Queries the Wayback Machine CDX Server API for all archived versions of this script.
- Filters to select the earliest configuration script snapshot for each month.
- Downloads and saves these configuration scripts. (Note: They are saved with an
.htmlextension but typically contain JavaScript code). - Uses a temporary
checkpoint.txtfor resumable CDX record fetching per Pixel ID.
- Constructs the URL for the Meta Pixel configuration script (e.g.,
- Reads
- Configuration (within
fetchConfigurationSnapshots.py):PIXEL_HISTORY_PATH: (Line ~128) Path to thepixelHistoryComplete.csvfile.- Example:
PIXEL_HISTORY_PATH = "pixelHistoryComplete.csv"(If you renamed outputs from Step 2, adjust accordingly).
- Example:
OUTPUT_FOLDER: (Line ~129) Base directory to save the downloaded Pixel configuration scripts.- Example:
OUTPUT_FOLDER = "allPixelConfigs"(ConsiderallPixelConfigs_top10k,allPixelConfigs_healthsif running for different datasets).
- Example:
- How to Run:
- Ensure
PIXEL_HISTORY_PATHpoints to the correctpixelHistoryComplete.csvfrom Step 2. - Set
OUTPUT_FOLDERas desired. - Execute the script:
python fetchConfigurationSnapshots.py
- Ensure
- Expected Output:
- A directory structure containing Pixel configuration scripts:
- Example:
allPixelConfigs/example.com/123456789012345/20200115102030.html
- Example:
- A temporary
checkpoint.txtwill be created and deleted during processing for each Pixel ID.
- A directory structure containing Pixel configuration scripts:
If you are processing both tranco_top_10k.csv and health_websites.csv (or other lists):
- It is highly recommended to run the entire 3-step pipeline separately for each list.
- Carefully manage your output directories and intermediate file names to prevent data from one run overwriting another.
- For
fetchWebsiteSnapshots.py: Modifymount_pathandprogress_file. - For
extractPixelIDs.py: ModifyBASE_FOLDER(to point to the correct Step 1 output),live_folder_path, and manually rename the output CSVs (pixelHistory.csv,pixelHistoryLive.csv,pixelHistoryComplete.csv) after each run. - For
fetchConfigurationSnapshots.py: ModifyPIXEL_HISTORY_PATH(to point to the correctpixelHistoryComplete.csv) andOUTPUT_FOLDER.
- For
This section describes the scripts used to process the collected data and perform the analyses presented in the paper.
- Purpose: This script is the primary data processing engine. It parses the downloaded Meta Pixel configuration script snapshots (from
fetchConfigurationSnapshots.py) to extract structured information about their settings and features. The output is a Pandas DataFrame that serves as the basis for subsequent analyses. - Input:
- The directory containing the downloaded Pixel configuration script snapshots (e.g.,
allPixelConfigs/). This directory should have a structure like:<base_folder>/<website_domain>/<pixel_id>/<timestamp>.html.
- The directory containing the downloaded Pixel configuration script snapshots (e.g.,
- Key Operations:
- File Iteration: Traverses the input directory, processing each HTML file (which contains a Pixel configuration script). It extracts the
website,pixel_id, andtimestampfrom the file/folder path. - Configuration Code Extraction (
extractConfigurationCode):- Reads the content of each HTML file.
- Isolates the relevant JavaScript code block that defines the Pixel's configuration, typically starting with
fbq.registerPluginand ending before any comments or subsequent code.
- Core Parsing Logic (
parse_pixel_code):- If configuration code is found, it uses regular expressions to parse different types of Pixel configuration statements:
fbq.loadPlugin("PLUGIN_NAME");: Identifies loaded plugins.instance.optIn("PIXEL_ID", "CONFIG_NAME", true/false);: Captures opt-in settings for specific configurations (e.g., 'UnwantedData', 'AutomaticMatching').config.set("PIXEL_ID" or null, "CONFIG_NAME", {JSON_DATA});: Extracts detailed JSON-formatted configuration data for features.fbq.set("CONFIG_NAME", "PIXEL_ID", [LIST_DATA]);: Parses settings like 'estRules' which are defined as lists.
- Normalizes configuration names (lowercase, remove underscores).
- Stores parsed elements (plugins, opt-ins, config.set, fbq.set) into separate lists.
- If configuration code is found, it uses regular expressions to parse different types of Pixel configuration statements:
- DataFrame Creation & Matching (
makeConfigDataframe,parse_dataframe,manualMatch,returnMatch):- The initial parsed data is structured into a temporary DataFrame.
- A matching logic (
manualMatch,returnMatch) attempts to align related configuration parts (e.g., a plugin load with its corresponding opt-in or config.set). This seems to handle variations in naming conventions (e.g., "jsonldmicrodata" vs "microdatajsonld", "cookie" vs "firstpartycookies"). - The goal is to create a more unified representation of each configuration setting.
- Aggregation (
aggregate_source_code_info):- For each processed configuration script, the detailed parsed and matched DataFrame is aggregated into a single row. This row summarizes:
plugin_names: A list of unique plugin names found.opt_in_info: A list of (config_name, enabled_status) tuples frominstance.optIn.config_set_info: A list of (config_name, json_data) tuples fromconfig.set.fbq_set_info: A list of (config_name, list_data) tuples fromfbq.set.
- The
timestamp,website, andpixel_idare added to this aggregated row.
- For each processed configuration script, the detailed parsed and matched DataFrame is aggregated into a single row. This row summarizes:
- Final DataFrame Construction:
- All aggregated rows (one per successfully parsed configuration script) are collected into a final Pandas DataFrame.
- This DataFrame is sorted by
timestamp.
- File Iteration: Traverses the input directory, processing each HTML file (which contains a Pixel configuration script). It extracts the
- Configuration (within
processSnapshots.py):folder_path: (Line ~229) Path to the input directory containing the configuration script snapshots (e.g.,"allPixelConfigs").
- How to Run:
- Ensure the
folder_pathvariable points to the correct directory containing the output fromfetchConfigurationSnapshots.py. - Execute the script:
python processSnapshots.py
- Ensure the
- Expected Output:
- The script will print progress using
tqdmas it processes websites, Pixel IDs, and HTML files. - The primary output is a Pandas DataFrame named
final_aggregated_df(in memory at the end of the script). Note: The provided script does not explicitly save this DataFrame to a file (e.g., a CSV or pickle). You would typically add a line likefinal_aggregated_df.to_csv('processed_pixel_configs.csv', index=False)orfinal_aggregated_df.to_pickle('processed_pixel_configs.pkl')at the end of the script to persist the results for further analysis. - The DataFrame
final_aggregated_dfwill have the following columns:plugin_name: List of plugin names (e.g.,['unwanteddata', 'inferredevents']).opt_in_info: List of tuples, e.g.,[('automaticsetup', 'true'), ('firstpartycookies', 'true')].config_set_info: List of tuples, where the second element is a dictionary (parsed JSON), e.g.,[('automaticmatching', {'selectedMatchKeys': ['em', 'ph']})].fbq_set_info: List of tuples, where the second element is a list (parsed JSON array), e.g.,[('estrules', [{'condition': {...}, 'derived_event_name': '...'}])].timestamp: Pandas datetime object representing the snapshot time.website: String, the domain name of the website.pixel_id: String, the Meta Pixel ID.
- The script will print progress using
This section details the Jupyter Notebooks used for specific analyses and plot generation, building upon the processed data from processSnapshots.py.
General Instructions for Jupyter Notebooks:
- Ensure you have Jupyter Notebook or JupyterLab installed (
pip install notebook jupyterlab). - These notebooks are designed to be run cell by cell, in sequential order from top to bottom.
- Path Configuration: Before running, carefully review the initial cells of each notebook to set the correct paths to your input data files (e.g., the processed configuration DataFrames from
processSnapshots.py, Pixel ID history files). - These notebooks can be run locally or potentially on platforms like Google Colaboratory (you might need to upload data files or connect to Google Drive).
- Purpose: This Jupyter Notebook generates various plots comparing the adoption rates of different Meta Pixel features and configurations over time, contrasting between the Top 10k websites and Health websites. Many of the website adoption graphs presented in the research paper are produced by this notebook.
- Input:
- The processed Pixel configuration DataFrame for Top 10k websites (output from
processSnapshots.py, e.g.,processed_pixel_configs_top10k.csvor.pkl). - The processed Pixel configuration DataFrame for Health websites (output from
processSnapshots.py, e.g.,processed_pixel_configs_health.csvor.pkl).
- The processed Pixel configuration DataFrame for Top 10k websites (output from
- Key Analyses & Outputs:
- Temporal plots showing adoption trends of features like:
- Automatic Events (e.g.,
AutomaticSetup,InferredEvents) - Identity Tracking features (e.g.,
FirstPartyCookies,AutomaticMatching) - Tracking Restriction features (e.g.,
ProtectedDataMode/ Core Setup)
- Automatic Events (e.g.,
- Comparisons of these trends between the Top 10k and Health website cohorts.
- The notebook will display plots inline and may include cells to save these plots to image files.
- Temporal plots showing adoption trends of features like:
- How to Run:
- Open
comparisonPlots.ipynbin Jupyter Notebook or JupyterLab. - In the initial cells, carefully update the file paths to point to your processed configuration DataFrames for both Top 10k and Health websites.
- Run all cells sequentially from top to bottom.
- Open
- Dependencies (beyond
requirements.txt): Typically plotting libraries likematplotlibandseaborn(which should be covered by a standard data science environment, but good to have inrequirements.txt).
- Purpose: This notebook focuses on the analysis of Meta Pixel's "Unwanted Data" filtering mechanism, specifically examining
blacklisted_keysandsensitive_keys. It also investigates the nature of data being filtered, including examples of sensitive health-related information. - Input:
- The processed Pixel configuration DataFrame for Top 10k websites.
- The processed Pixel configuration DataFrame for Health websites.
hashes.txt: A text file containing SHA-256 hashes (presumably ofsensitive_keys) and their corresponding decrypted plaintext values if found (e.g., via CrackStation). This file is crucial for understanding the nature of hashed sensitive keys.
- Key Analyses & Outputs:
- Identification and quantification of
blacklisted_keysandsensitive_keysacross websites. - Analysis of the
hashes.txtfile to understand what plaintext parameters are being hashed assensitive_keys. - Examples of potentially sensitive parameters or event data identified (e.g., related to "OCD").
- Plots showing the adoption of these filtering mechanisms over time for both Top 10k and Health websites.
- Analysis of website overlap in the usage of common
blacklisted_keysorsensitive_keys. - Plots corresponding to these analyses as presented in the paper.
- Identification and quantification of
- How to Run:
- Open
unwantedDataAnalysis.ipynb. - Ensure
hashes.txtis in the expected location or update the path in the notebook if it's loaded from a specific path. - In the initial cells, update the file paths to your processed configuration DataFrames.
- Run all cells sequentially.
- Open
- Note on
hashes.txt: The quality of analysis forsensitive_keysdepends significantly on the completeness and accuracy ofhashes.txt.
- Purpose: This notebook analyzes the prevalence of Meta Pixels on websites. It determines how many websites have at least one Meta Pixel installed and, of those, for how many the corresponding configuration scripts were successfully fetched and archived. It generates the Pixel presence plot shown in the paper.
- Input:
- The final Pixel ID history file for Top 10k websites (e.g.,
pixelHistoryComplete_top10k.csvfromextractPixelIDs.py). - The final Pixel ID history file for Health websites (e.g.,
pixelHistoryComplete_health.csvfromextractPixelIDs.py). - The directory containing the downloaded Pixel configuration script snapshots for Top 10k websites (e.g.,
allPixelConfigs_top10k/). - The directory containing the downloaded Pixel configuration script snapshots for Health websites (e.g.,
allPixelConfigs_health/).
- The final Pixel ID history file for Top 10k websites (e.g.,
- Key Analyses & Outputs:
- Calculation of the number of websites with at least one Pixel ID detected over time.
- Calculation of the number of websites for which at least one configuration script was found and archived by the Wayback Machine.
- Generation of a plot (similar to Figure 3 in your paper) showing these trends for both Top 10k and Health websites.
- How to Run:
- Open
pixelPresence.ipynb. - In the initial cells, update the file paths to your
pixelHistoryComplete.csvfiles and the base directories for theallPixelConfigsdata for both Top 10k and Health cohorts. - Run all cells sequentially.
- Open