SURGE: Exploring LLMs, Detection, and GAN-Augmented Models

A SURGE through the world of large language models (LLMs), text-detection, and GAN-based attention methods.

Repository: https://github.com/HARSHITJAIS14/SURGE

🗂️ Repository Structure


harshitjais14-surge/
├── README.md
├── Week1/
│   └── LLMIntro-2024-07-22-1743.excalidraw
├── Week2/
│   └── Code/
│       ├── mpi.ipynb
│       └── mpi\_120.csv
├── Week3/
│   └── LLMTextDetectionSurvey.pdf
├── Week4/
│   ├── bert-pretrainingdetector.ipynb
│   ├── description.md
│   └── merged\_dataset.csv
├── Week6/
│   ├── gan-detection.ipynb
│   └── gan\_bertattention.ipynb
└── Week7/
└── GANBERT\_pytorch.ipynb

📖 Week-by-Week Summary

Week 1: Introduction to LLMs

In Week 1, we dove into the fundamentals and workflow of large language models—from pretraining to supervised fine-tuning, and finally to reinforcement learning (including RLHF). Along the way, we covered key concepts such as base architectures, dataset creation, instruction tuning, and common pitfalls like hallucinations. We also did a hands-on quickstart with the OpenAI API.

Topics Covered:

LLM architecture & pretraining stages
Supervised Fine-Tuning (SFT)
Reinforcement Learning & RLHF
Prompting basics
Hallucination in LLMs
OpenAI API usage

Resources:

Artifacts:

Diagrammed workflow in Excalidraw: Week1/LLMIntro-2024-07-22-1743.excalidraw

Week 2: MPI Personality Evaluation

Implemented a Machine Personality Inventory tool using Big Five Personality Factors (OCEAN) for assessment of ChatGPT model on text data.
Notebook mpi.ipynb walks through data preprocessing, feature extraction, and personality inference.
Dataset sample in mpi_120.csv.

Week 3: Survey on LLM Text Detection

Read a literature survey of LLM text-detection approaches (PDF in Week3) which gave a full view over the works done over LLM Text Detection till 2023.
Summarized methods, benchmarks, and open challenges in a concise report and added it to the merged pdf.

Week 4: BERT-Based Pretraining Detector

Developed a detector to distinguish pretrained vs. fine-tuned text using BERT.
Notebook bert-pretrainingdetector.ipynb includes model training and evaluation.
merged_dataset.csv is a smaller version of the CHEAT dataset; description.md details dataset construction.

Week 5: Exploring Datasets

Learnt about a lot of dataset for LLM Text Detection and listed some of the major datasets and their papers in the Week5 Directory.

Week 6: GAN-Based Detection & Attention

Explored GAN-based methods for generating and detecting synthetic text/images.
gan-detection.ipynb builds a basic GAN over word co-occurence matrix as shown in the pdf of the research paper gan_detection_compressed.pdf for adversarial examples.
gan_bertattention.ipynb adds a BERT-attention module to enhance detection robustness.

Week 7: GAN-BERT in PyTorch

Full PyTorch implementation of GANBERT paper: integrating GAN-generated data into BERT training loops.
Notebook GANBERT_pytorch.ipynb demonstrates training and performance analysis over the short version of RAID dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Abstract		Abstract
Reports		Reports
Week1		Week1
Week10		Week10
Week2		Week2
Week3		Week3
Week4		Week4
Week5		Week5
Week6		Week6
Week7		Week7
Week8		Week8
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SURGE: Exploring LLMs, Detection, and GAN-Augmented Models

🗂️ Repository Structure

📖 Week-by-Week Summary

Week 1: Introduction to LLMs

Week 2: MPI Personality Evaluation

Week 3: Survey on LLM Text Detection

Week 4: BERT-Based Pretraining Detector

Week 5: Exploring Datasets

Week 6: GAN-Based Detection & Attention

Week 7: GAN-BERT in PyTorch

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SURGE: Exploring LLMs, Detection, and GAN-Augmented Models

🗂️ Repository Structure

📖 Week-by-Week Summary

Week 1: Introduction to LLMs

Week 2: MPI Personality Evaluation

Week 3: Survey on LLM Text Detection

Week 4: BERT-Based Pretraining Detector

Week 5: Exploring Datasets

Week 6: GAN-Based Detection & Attention

Week 7: GAN-BERT in PyTorch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages