Skip to content

Davygupta47/Codsoft4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

CodSoft Team,

I wanted to take a moment to express my sincere gratitude for the remote internship experience in machine learning. Working independently on each project provided a unique opportunity to develop my technical skills and gain hands-on exposure to real-world applications. The guidance and support I received were invaluable in helping me navigate challenges and deepen my understanding.

Thank you for this opportunity, which has greatly contributed to my growth. I look forward to applying what I've learned in future projects.

Dwaipayan Dasgupta

Codsoft4

Intership Project

SMS Spam Detection

This project aims to build an AI model to classify SMS messages as spam or legitimate (ham). Using natural language processing (NLP) techniques, the model analyzes message content and applies machine learning algorithms like Naive Bayes, Logistic Regression, and Support Vector Machines to identify spam.

Introduction Spam detection in SMS messages is a common use case of NLP in filtering unwanted or potentially harmful messages. This project builds a classifier using various machine learning models to predict whether a message is spam or legitimate based on text features.

Dataset The dataset used in this project includes SMS messages with two columns:

Label: The target variable, either ham (legitimate) or spam. Message: The SMS text content. The dataset can be downloaded from sources like Kaggle or used as a custom dataset if available.

Features The project leverages the following steps for feature extraction:

TF-IDF Vectorization: The text content of each message is transformed into a numerical representation using TF-IDF (Term Frequency-Inverse Document Frequency). This captures the importance of words in each message relative to the whole dataset. Modeling The following machine learning algorithms are applied:

Naive Bayes: A probabilistic model that is effective for text classification. Logistic Regression: A linear model for binary classification, commonly used as a baseline for NLP tasks. Support Vector Machine (SVM): A robust classification model that works well with high-dimensional data like TF-IDF features. Model Evaluation Each model is evaluated using:

Accuracy: The proportion of correctly predicted messages. Precision, Recall, and F1-score: To balance false positives and false negatives. Confusion Matrix: To visualize the model’s performance on legitimate and spam messages. Usage Data Preparation: Preprocess the data by handling missing values and encoding labels. Model Training and Evaluation: Train each model and evaluate its performance with relevant metrics and visualizations.

Results The model results include metrics and confusion matrices for each algorithm, providing insights into the most effective model for spam detection.

Sample Confusion Matrix:

image

image

image

Contributing Contributions are welcome! Feel free to open an issue or submit a pull request.

About

Intership Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors