diff --git a/.github/workflows/deploy-web.yml b/.github/workflows/deploy-web.yml new file mode 100644 index 0000000..6c9e2a9 --- /dev/null +++ b/.github/workflows/deploy-web.yml @@ -0,0 +1,54 @@ +name: Deploy Web Playground + +on: + push: + branches: [ main ] + paths: + - 'web/**' + - 'attacks/**' + - 'analysis/**' + workflow_dispatch: + +permissions: + contents: read + pages: write + id-token: write + +concurrency: + group: pages + cancel-in-progress: false + +jobs: + build: + name: Build web playground + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: web/package-lock.json + - name: Install dependencies + run: npm ci + working-directory: web + - name: Build + run: npm run build + working-directory: web + - name: Upload pages artifact + uses: actions/upload-pages-artifact@v3 + with: + path: web/dist + + deploy: + name: Deploy to GitHub Pages + needs: build + runs-on: ubuntu-latest + environment: + name: github-pages + url: ${{ steps.deployment.outputs.page_url }} + steps: + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v4 diff --git a/web/.gitignore b/web/.gitignore new file mode 100644 index 0000000..5ccae52 --- /dev/null +++ b/web/.gitignore @@ -0,0 +1,18 @@ +# Dependencies +node_modules + +# Build output +dist + +# Generated at build time by vite.config.ts +public/pg_modules + +# Editor directories and files +.vscode/* +!.vscode/extensions.json +*.swp +*.swo +*~ + +# OS files +.DS_Store diff --git a/web/index.html b/web/index.html new file mode 100644 index 0000000..e0f738b --- /dev/null +++ b/web/index.html @@ -0,0 +1,12 @@ + + +
+ + +{requiredColumns.join(", ")}
+
+ + Analyze code memorization using AST-based tree edit distance and CodeBLEU metrics. + These methods parse source code into abstract syntax trees and measure structural + similarity, providing more meaningful comparisons than raw text matching. +
+ ++ Browser-based code similarity analysis requires tree-sitter WebAssembly support, + which is currently under development. In the meantime, you can use the full + PrivacyGuard library locally for code similarity analysis. +
+ +
+{`pip install privacyguard
+
+from privacyguard.attacks.code_similarity import CodeSimilarityAttack
+from privacyguard.analysis.code_similarity import TreeEditDistanceNode
+
+attack = CodeSimilarityAttack(
+ reference_code=reference_snippets,
+ generated_code=generated_snippets,
+ language="python",
+)
+result = attack.run_attack()
+
+analysis = TreeEditDistanceNode()
+output = analysis.run_analysis(result)`}
+
+
+ + See the full documentation on{" "} + + GitHub + . +
++ Compute epsilon using the f-DP (functional Differential Privacy) canary + analysis. Provide the canary counts below and click Calculate. +
+ ++ Test whether a model leaks information about training labels through its + predictions. +
+ +
+ You need two CSVs: one with target model predictions{" "}
+ and one with calibration model predictions. The target
+ CSV must include is_member, predictions, and{" "}
+ label columns. The calibration CSV must include{" "}
+ is_member and predictions columns. See the{" "}
+
+ PrivacyGuard documentation
+ {" "}
+ for details on generating these inputs.
+
Audit ML model privacy in your browser
++ All computation runs locally in your browser via WebAssembly. Your data never leaves this page. +
+{attack.description}
+ + ))} ++ State-of-the-art MIA using shadow model statistics. Computes + likelihood ratios from pre-computed shadow model scores. +
+ +
+ Follow the{" "}
+
+ LiRA tutorial notebook
+ {" "}
+ to generate shadow model scores. The output CSV should contain{" "}
+ is_member, score_orig,{" "}
+ score_mean, and score_std columns.
+
+ Higher-power attack using fewer shadow models via reference model + comparison. +
+ ++ Follow the{" "} + + RMIA tutorial notebook + {" "} + to generate reference model scores. You will need two CSVs: one + with member/holdout data and one with population data. +
++ Baseline MIA using calibrated prediction confidence scores. +
+ ++ Computes exact match, longest common subsequence (word/char), and + edit distance between model generations and target text. +
+ ++ Estimates the probability that a model has memorized specific + content, based on per-token log-probabilities. +
+ +
+ Extract per-token log-probabilities from a HuggingFace model and
+ save as CSV with a prediction_logprobs column:
+
+ {PROB_MEM_SNIPPET}
+
+