Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions I2ML_Machine_Learning/2023-11-05T18-49_export.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
,Question,Answer,Satisfaction
0,What is the name of the candidate?,sanjay bhaskar,Yes
1,What is the phone number of the candidate?,945 - 244 - 7079,Yes
2,What is the education details of the candidate?,"master of science , information systems",Yes
3,What are the skills of the candidate?,"programming languages : python , r , java , sql , c , c + + , html , matlab",Yes
4,Does the candidate have data science skillset?,"data science methods and tools , statistics , data management and database design , big data systems and advanced intelligent analytics , data engineering bachelor of engineering , electronics and telecommunication visvesvaraya technological university , bangalore , in june 2019 skills programming languages : python , r , java , sql , c , c + + , html , matlab database : azure sql server , ms sql server , postgresql , mysql , cassandra , mongodb , hadoop , hive technology & tools : tableau , power bi , git , r - studio , anaconda , advanced microsoft excel , powerpoint , arduino , labview , netbeans , miro , scikit learn , tensorflow , streamlit , pytorch , nltk , spacy , corenlp , snowflake , gcp , amazon s3 , docker , ec2 , fastapi , airflow work experience analytics specialist intern , havas media , boston june 2023 – august 2023 • conducted data analysis and provided actionable insights for puma and fidelity investments , resulting in improved campaign performance and optimization strategies . • utilized tableau and power bi to track key metrics , visualize data , and identify opportunities for campaign enhancement , offering valuable insights and recommending next steps for more effective marketing strategies . • collaborated with cross - functional teams to develop data - driven recommendations for campaign improvement , including audience targeting , messaging optimization , and budget allocation , contributing to the achievement of client objectives . business analyst , mckinsey & company january 2020 - august 2022 • designed dashboards to segment visual analysts into profiles , helping the global consultants to increase the engagements and collaborate with the analysts by 30 % • performed data extraction and enhanced data quality using sql to generate business kpi ’ s for the stakeholders , and visualized using tableau dashboard for various lines of business • performed a variety of activities as a design traffic planner , including revising design requests , rerouting , confirming , elaborating on requirements , and accurately assigning requests with 99 % accuracy • piloted and worked with the automation team to auto [SEP]",Yes
5,Does the candidate have data science skillset?,"data science methods and tools , statistics , data management and database design , big data systems and advanced intelligent analytics , data engineering bachelor of engineering , electronics and telecommunication visvesvaraya technological university , bangalore , in june 2019 skills programming languages : python , r , java , sql , c , c + + , html , matlab database : azure sql server , ms sql server , postgresql , mysql , cassandra , mongodb , hadoop , hive technology & tools : tableau , power bi , git , r - studio , anaconda , advanced microsoft excel , powerpoint , arduino , labview , netbeans , miro , scikit learn , tensorflow , streamlit , pytorch , nltk , spacy , corenlp , snowflake , gcp , amazon s3 , docker , ec2 , fastapi , airflow work experience analytics specialist intern , havas media , boston june 2023 – august 2023 • conducted data analysis and provided actionable insights for puma and fidelity investments , resulting in improved campaign performance and optimization strategies . • utilized tableau and power bi to track key metrics , visualize data , and identify opportunities for campaign enhancement , offering valuable insights and recommending next steps for more effective marketing strategies . • collaborated with cross - functional teams to develop data - driven recommendations for campaign improvement , including audience targeting , messaging optimization , and budget allocation , contributing to the achievement of client objectives . business analyst , mckinsey & company january 2020 - august 2022 • designed dashboards to segment visual analysts into profiles , helping the global consultants to increase the engagements and collaborate with the analysts by 30 % • performed data extraction and enhanced data quality using sql to generate business kpi ’ s for the stakeholders , and visualized using tableau dashboard for various lines of business • performed a variety of activities as a design traffic planner , including revising design requests , rerouting , confirming , elaborating on requirements , and accurately assigning requests with 99 % accuracy • piloted and worked with the automation team to auto [SEP]",No
6,What is the candidate's professional experience?,"master of science , information systems northeastern university , boston , ma expected may 2024 coursework : application engineering and development , data science methods and tools , statistics , data management and database design , big data systems and advanced intelligent analytics , data engineering bachelor of engineering , electronics and telecommunication visvesvaraya technological university , bangalore , in june 2019 skills programming languages : python , r , java , sql , c , c + + , html , matlab database : azure sql server , ms sql server , postgresql , mysql , cassandra , mongodb , hadoop , hive technology & tools : tableau , power bi , git , r - studio , anaconda , advanced microsoft excel , powerpoint , arduino , labview , netbeans , miro , scikit learn , tensorflow , streamlit , pytorch , nltk , spacy , corenlp , snowflake , gcp , amazon s3 , docker , ec2 , fastapi , airflow work experience analytics specialist",No
7,What is the name of the candidate?,sanjay bhaskar,Yes
8,What is the name of the candidate?\,sanjay bhaskar,Yes
9,What is the phone number of the candidate?,945 - 244 - 7079,Yes
10,Did the candidate work as an analytics specialist?,,Yes
11,Did the candidate work as an analytics specialist?,,No
12,Did the candidate work as an intern?,"work experience analytics specialist intern , havas media , boston june 2023 – august 2023",No
13,Did the candidate work as an intern?,"work experience analytics specialist intern , havas media , boston june 2023 – august 2023",Yes
14,What is the candidate ,,Yes
15,What is the candidate's leadership experience?,[SEP],Yes
70 changes: 68 additions & 2 deletions I2ML_Machine_Learning/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,70 @@
# Skunks Skool
Skunks Skool Tutorials
# Resume Enhancement Web Application
App Link: https://huggingface.co/spaces/sanjay11/resumesimilarity

![Screenshot 1](screenshots/MicrosoftTeams-image.png)
## Introduction

### Purpose
This project aims to develop a web application designed to assist individuals in enhancing their resumes. Leveraging advanced Natural Language Processing (NLP) techniques, the application analyzes resumes, suggests improvements, and provides answers to user queries based on the resume content.

### Target Audience
The primary audience for this app includes job seekers, career advisors, and anyone interested in refining their resume to better match job descriptions.

## Technology Used

- **Python:** Chosen for its strong support in data science and NLP libraries.
- **Streamlit app:** An efficient framework for building interactive web apps entirely in Python.
- **Spacy:** Utilized for efficient and accurate NLP tasks, particularly in keyword extraction and text analysis.
- **PyPDF2:** Used to handle PDF file reading and text extraction.
- **Transformers (BERT):** Employed for its state-of-the-art performance in NLP tasks, particularly in question answering.

### Resume Question Answering Model
1. **User Uploads Resume:** Users start by uploading a resume.
2. **Question Input:** After uploading the resume, users can input questions related to the resume content.
3. **Model Processing:** The BERT-based model processes the resume and the questions, leveraging its contextual understanding to provide accurate responses.
4. **Answer Display:** The model returns answers to the user's questions, displayed in the Streamlit application.
5. **Feedback Collection:** Users are encouraged to provide feedback on the quality of the answers. They can indicate whether the answers were satisfactory or not.

![Screenshot 2](screenshots/2.png)

## Application Overview

The application provides various features:

- **Text Extraction from PDF Resumes:** Extracts user-uploaded resumes in PDF format.
- **Question Answering:** Uses a BERT model to answer questions based on the resume content.
- **Keyword Extraction:** Extracts keywords from resumes and job descriptions to suggest improvements.
- **Resume and Job Description Analysis:** Compares the two documents for matching keywords.

### NLP Techniques

#### BERT for Question Answering
- The BERT model, specifically the `bert-large-uncased-whole-word-masking-finetuned-squad`, is used for its excellence in understanding the context of a word in a sentence. The model, pre-trained on a vast corpus and fine-tuned on question-answering tasks, can comprehend and provide precise answers to user queries based on their resume.

#### Keyword Extraction and Resume Analysis
- The `extract_keywords_for_sections` function uses Spacy to identify keywords in both the resume and the job description. The app then suggests improvements and identifies potential project ideas based on these keywords.

#### Spacy for Keyword Extraction and Pattern Matching
- Spacy is used for tokenizing the resume and job description texts, tagging each token with its part of speech. It then uses pattern matching to identify skills, technologies, and project ideas, thereby enabling effective keyword extraction. This extraction is pivotal in analyzing and suggesting enhancements for the resume.

## User Interface and Interaction

- Developed using Streamlit, the app provides a clean and interactive interface.
- Users can upload their resumes, input job descriptions, ask questions, and receive tailored suggestions.
- The interface is designed to be user-friendly, allowing seamless navigation through various features.
![Screenshot 2](screenshots/1.png)
![Screenshot 3](screenshots/3.png)
## Challenges and Solutions

- One of the primary challenges was ensuring the accuracy of NLP tasks.
- This was addressed by carefully selecting and fine-tuning the BERT and Spacy models.
- Another challenge involved creating an intuitive user interface, which was resolved using Streamlit's straightforward framework.

## Future Enhancements

Future plans include integrating more advanced NLP models for broader language support, enhancing the app's scalability, and incorporating real-time resume editing features.

## Conclusion

- This project successfully demonstrates the use of cutting-edge NLP techniques in a practical application.
- It offers valuable assistance in resume enhancement, catering to the needs of job seekers and career professionals alike.
97 changes: 97 additions & 0 deletions I2ML_Machine_Learning/bertimproved.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import streamlit as st
from transformers import BertForQuestionAnswering, BertTokenizer
import torch
from io import BytesIO
import PyPDF2
import pandas as pd

# Initialize session state to store the log of QA pairs and satisfaction responses
if 'qa_log' not in st.session_state:
st.session_state.qa_log = []

def extract_text_from_pdf(pdf_file):
pdf_reader = PyPDF2.PdfReader(BytesIO(pdf_file.read()))
text = ""
for page in pdf_reader.pages:
text += page.extract_text()
return text

def answer_question(question, context, model, tokenizer):
inputs = tokenizer.encode_plus(
question,
context,
add_special_tokens=True,
return_tensors="pt",
truncation="only_second",
max_length=512,
)
outputs = model(**inputs, return_dict=True)
answer_start_scores = outputs.start_logits
answer_end_scores = outputs.end_logits
answer_start = torch.argmax(answer_start_scores)
answer_end = torch.argmax(answer_end_scores) + 1
input_ids = inputs["input_ids"].tolist()[0]
answer = tokenizer.convert_tokens_to_string(
tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
)
return answer

st.title("Resume Question Answering")

uploaded_file = st.file_uploader("Upload your resume (PDF format only)", type=["pdf"])

if uploaded_file is not None:
resume_text = extract_text_from_pdf(uploaded_file)
st.write("Resume Text:")
st.write(resume_text)

user_question = st.text_input("Ask a question based on your resume:")

if user_question:
model = BertForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

answer = answer_question(user_question, resume_text, model, tokenizer)
st.write("Answer:")
st.write(answer)

# Ask for user feedback on satisfaction
satisfaction = st.radio('Are you satisfied with the answer?', ('Yes', 'No'), key='satisfaction')

# Log the interaction
st.session_state.qa_log.append({
'Question': user_question,
'Answer': answer,
'Satisfaction': satisfaction
})

# Display the log in a table format
st.write("Interaction Log:")
log_df = pd.DataFrame(st.session_state.qa_log)
st.dataframe(log_df)


1 / 2

# Conclusion:

# In conclusion, our "Resume Question Answering" language model, powered by BERT and integrated into a user-friendly Streamlit application,
# has demonstrated significant promise in revolutionizing the resume evaluation process. Through the successful execution of this project,
# we have achieved several key milestones and laid a foundation for future developments:

# Efficient Resume Analysis: Our model has showcased the ability to efficiently process uploaded resumes and respond to user queries with a high degree of accuracy. Users can easily extract valuable information from resumes, such as candidate names, contact details, and more.
# User Feedback Integration: The incorporation of a feedback mechanism has been pivotal in our project. We have gathered user-generated feedback on answer quality and user satisfaction, enabling us to refine and enhance the model continuously.
# Iterative Development: Our commitment to iterative development based on user feedback ensures that our model will only get better over time. This ongoing improvement process positions our model as a dynamic tool that adapts to user needs.
# Data Privacy and Security: We have prioritized data privacy and security, ensuring the protection of user-uploaded resumes and personal information. Users can confidently use our application without concerns about data breaches.

# Future Scope:

# The future of the "Resume Question Answering" project holds exciting prospects and opportunities for further advancements:

# Advanced NLP Models: As NLP technology continues to evolve, we plan to explore and integrate more advanced models such as GPT-3, RoBERTa, and their successors to enhance the accuracy and capabilities of our system.
# Multi-Language Support: Expanding the language capabilities of our model to accommodate a broader range of languages and resume formats will be a key focus, making it accessible to a global audience.
# Scalability and Performance: Enhancing the model's scalability to handle a larger number of users and improve its overall performance will be a critical consideration.
# Enhanced User Interface: We will continuously improve the Streamlit application's user interface to make it even more user-friendly and intuitive.
# Industry-Specific Versions: Creating specialized versions of our model for various industries, such as healthcare, technology, or finance, to provide tailored solutions for specific job roles and requirements.
# Collaboration and Integration: Exploring collaborations with job boards, HR platforms, and recruitment agencies to integrate our technology and streamline the hiring process.
# Research and Innovation: Staying at the forefront of NLP research and technology advancements to maintain our model's state-of-the-art status.
Loading