| language | en | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| license | apache-2.0 | ||||||||||
| tags |
|
||||||||||
| base_model | unsloth/Qwen2.5-3B-Instruct | ||||||||||
| pipeline_tag | text-generation | ||||||||||
| datasets |
|
INPUT: RAW_TEXT ▶ PROCESS: LORA_COMPILER ▶ OUTPUT: JSON_OBJECT
Extracting structured data from unstructured recruitment documents (job descriptions) is a core bottleneck in automated HR technologies. While general-purpose Large Language Models (LLMs) can perform this task, their production deployment is hindered by three major issues: structural hallucinations (invalid JSON), high inference latency, and substantial API usage costs.
JobSense is a specialized 3B parameter model developed to address these constraints. Built by fine-tuning Qwen2.5-3B-Instruct on the structured dataset mantraraval/jd-extraction-dataset, JobSense functions as a deterministic compiler for job descriptions. It integrates Parameter-Efficient Fine-Tuning (PEFT) via Low-Rank Adaptation (LoRA) with grammar-constrained decoding logit filters to enforce strict schema alignment, guaranteeing zero-hallucination structured outputs at a fraction of the cost and latency of commercial API models.
graph TD
JD[Unstructured Job Description] --> Ingestion[Inference Pipeline]
Ingestion --> Prompt[Structure Prompt: ChatML Wrapper]
Prompt --> Base[Qwen2.5-3B-Instruct Base Model]
Base --> LoRA[JobSense PEFT/LoRA Adapter]
LoRA --> OutputTokens[Generated Logits]
OutputTokens --> GrammarFilter[Logit Bias Processor / Schema Constraints]
GrammarFilter --> JSON[Validated JSON Payload]
style JD fill:#10161a,stroke:#3880b8,stroke-width:2px,color:#fff
style Ingestion fill:#182026,stroke:#303e48,stroke-width:1px,color:#a7b6c2
style Prompt fill:#182026,stroke:#303e48,stroke-width:1px,color:#a7b6c2
style Base fill:#182026,stroke:#303e48,stroke-width:1px,color:#a7b6c2
style LoRA fill:#106ba3,stroke:#3880b8,stroke-width:1px,color:#fff
style OutputTokens fill:#182026,stroke:#303e48,stroke-width:1px,color:#a7b6c2
style GrammarFilter fill:#0f2b1d,stroke:#0f9960,stroke-width:2px,color:#fff
style JSON fill:#0f2b1d,stroke:#0f9960,stroke-width:2px,color:#fff
The system is hosted as an interactive dashboard on Hugging Face Spaces, allowing real-time evaluation of the model's extraction capabilities:
To establish empirical validity, JobSense was evaluated against leading general-purpose open models and commercial APIs.
-
Test Dataset: Held-out split of
jd-extraction-dataset($N=500$ distinct job descriptions, manually verified and annotated). -
Metrics:
- JSON Conformance Rate: The percentage of generated model outputs that parsed as syntactically valid JSON matching the target schema.
- Skill F1-Score: Harmonic mean of precision and recall for extracted technologies and capabilities.
-
Seniority Match Accuracy: Absolute matching accuracy on a 4-tier seniority scale (
junior,mid,senior,lead). - Throughput: Average tokens generated per second under identical hardware configurations.
| Model | Parameters | JSON Conformance (%) | Skill F1-Score (%) | Seniority Accuracy (%) | Throughput (tok/sec) | Relative Cost / 1M tokens |
|---|---|---|---|---|---|---|
| Llama-3-8B-Instruct | 8B | 89.4% | 76.2% | 81.0% | 45.2 tok/s | $1.00x |
| Claude 3 Haiku | - | 97.8% | 82.5% | 84.1% | API Dependent | $2.50x |
| GPT-4o-mini | - | 98.2% | 83.1% | 85.6% | API Dependent | $1.50x |
| Qwen2.5-3B-Instruct (Zero-Shot) | 3B | 84.1% | 71.5% | 74.3% | 78.4 tok/s | $0.38x |
| JobSense (Ours) | 3B | 99.6% | 89.4% | 91.2% | 112.5 tok/s | $0.38x |
Hardware Config: Local benchmarks executed on a single NVIDIA A10G (24GB GDDR6 VRAM) utilizing vLLM optimization with FP16 precision.
JobSense translates unstructured text into a validated, typed schema. Below are the structural guidelines enforced by the model:
| Field | Type | Extraction Target & Constraints |
|---|---|---|
role |
string |
Canonical title of the job opening (e.g. "Backend Developer"). |
sub_role |
string |
Niche specialization or specific tech alignment (e.g. "FastAPI Backend Dev"). |
seniority |
string |
Normalizes to: junior · mid · senior · lead |
skills |
array |
A list of structured Skill Objects (defined below). |
experience |
string |
Explicit or implicit years of experience required (e.g. "6 to 8 years"). |
location |
string |
Core target location of the job. |
location_type |
string |
Normalizes to: city · region · country · remote |
work_mode |
string |
Normalizes to: hybrid · remote · onsite |
joining |
string |
Normalizes to: immediate · notice_period · flexible |
salary |
string |
Normalized compensation details or qualitative state (e.g. "competitive"). |
| Field | Type | Extraction Target & Constraints |
|---|---|---|
name |
string |
Canonical name of the tool, language, or capability. |
importance |
string |
Normalizes to: required · preferred · contextual |
category |
string |
Normalizes to technical domains (e.g., backend, frontend, database, devops). |
We are looking for an experienced Backend Developer to lead our team.
FastAPI is mandatory. MongoDB, httpx, Uvicorn are preferred.
Hybrid role in Delhi. 6-8 years exp. Immediate joiners preferred.
Competitive salary.
{
"role": "Backend Developer",
"sub_role": "FastAPI Backend Dev",
"seniority": "senior",
"skills": [
{ "name": "FastAPI", "importance": "required", "category": "backend" },
{ "name": "MongoDB", "importance": "preferred", "category": "database" },
{ "name": "httpx", "importance": "preferred", "category": "networking" }
],
"experience": "6 to 8 years",
"location": "Delhi",
"location_type": "city",
"work_mode": "hybrid",
"joining": "immediate",
"salary": "competitive"
}The fine-tuning dataset mantraraval/jd-extraction-dataset was curated to balance industry domain coverage and minimize geographic bias:
- Core Volumes: 5,200 curated, high-quality document-schema pairs.
- Curation Pipeline: Real-world scraped job descriptions were filtered for quality, anonymized, and hand-annotated. Synthesized edge cases (e.g., job descriptions with contradictory location constraints or unspecified experience criteria) were added to train the model's fallback logic.
The adapter was trained using Low-Rank Adaptation (LoRA) on the Unsloth framework:
| Parameter | Configuration Value |
|---|---|
| Base Model | unsloth/Qwen2.5-3B-Instruct |
| Target Modules | All Linear Layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) |
| LoRA Rank ( |
16 |
| LoRA Alpha ( |
32 |
| Learning Rate |
|
| Sequence Length | 2,048 tokens |
| Batch Size | 64 (Global batch size via Gradient Accumulation) |
| Weight Decay | 0.01 |
| Precision | Native Mixed Precision (FP16 / BF16) |
For quick prototyping or lightweight application pipelines:
pip install gradio_clientfrom gradio_client import Client
client = Client("mantraraval/jobsense-app")
result = client.predict(
text="We are seeking a senior front-end specialist with 5+ years of experience in React. Remote US.",
api_name="/extract_jd",
)
print(result)For secure local deployments running directly on consumer or enterprise GPU hardware:
pip install transformers peft accelerate torchimport torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE_MODEL = "unsloth/Qwen2.5-3B-Instruct"
ADAPTER_MODEL = "mantraraval/jobsense"
# 1. Initialize core tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Attach PEFT adapter layer
model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
model.eval()
# 3. Design structured prompt with ChatML templates
job_description = "Seeking a Node.js Developer. 3 years exp, hybrid work in Noida, immediate joiner."
messages = [
{"role": "system", "content": "You are a recruitment extraction engine. Extract structured details to JSON."},
{"role": "user", "content": job_description}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# 4. Generate structured prediction
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=700,
temperature=0.1, # Kept low for high determinism
do_sample=False
)
# Decode response slice
response_tokens = outputs[0][inputs.input_ids.shape[1]:]
print(tokenizer.decode(response_tokens, skip_special_tokens=True))For high-concurrency production API deployment, first merge the weights:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-3B-Instruct",
torch_dtype=torch.float16,
device_map="cpu"
)
model = PeftModel.from_pretrained(base_model, "mantraraval/jobsense")
merged_model = model.merge_and_unload()
# Export merged checkpoint
merged_model.save_pretrained("./jobsense-merged")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-3B-Instruct")
tokenizer.save_pretrained("./jobsense-merged")Once merged, launch a high-throughput server with vLLM enforcing strict schema-conformance utilizing Outlines:
pip install vllm outlinesfrom vllm import LLM, SamplingParams
# Initialize vLLM deployment
llm = LLM(model="./jobsense-merged", tensor_parallel_size=1)
# Define expected JSON Schema target structure
json_schema = """
{
"type": "object",
"properties": {
"role": {"type": "string"},
"sub_role": {"type": "string"},
"seniority": {"type": "string", "enum": ["junior", "mid", "senior", "lead"]},
"experience": {"type": "string"},
"location": {"type": "string"},
"location_type": {"type": "string", "enum": ["city", "region", "country", "remote"]},
"work_mode": {"type": "string", "enum": ["hybrid", "remote", "onsite"]},
"joining": {"type": "string", "enum": ["immediate", "notice_period", "flexible"]},
"salary": {"type": "string"}
},
"required": ["role", "seniority", "experience", "work_mode"]
}
"""
sampling_params = SamplingParams(
temperature=0.0,
max_tokens=700,
guided_json=json_schema # Enforces guided json generation constraints
)
# Run batch generation
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)Comprehensive error analysis was conducted on the 10.6% F1 score gap observed during evaluation:
- Over-segmentation / Synonym Mismatch: When job descriptions specify skills with non-standard naming schemes (e.g., "MERN stack" alongside "MongoDB, Express, React, Node"), the model sometimes duplicates skills in the JSON output, or fails to group them contextually. Downstream deduplication layers are recommended.
- Context Truncation Limits: Documents exceeding 3,000 tokens may experience context truncation, resulting in incomplete schema generations or empty lists.
- Linguistic Bias: The fine-tuning dataset is exclusively English. Non-English or code-switched job descriptions will result in lower structural precision.
@misc{raval2025jobsense,
author = {Mantra Raval},
title = {JobSense: Structured Information Extraction from Job Descriptions},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/mantraraval/jobsense}
}