Skip to content

krsna24/WordFrequencyCounter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordFrequencyCounter

Semantic Profiling

NLPFreq is a robust Command-Line Interface (CLI) tool designed for semantic profiling. It facilitates the analysis of raw text, files, or websites, offering enhanced data visualization, exploration, and integration capabilities.

Note: NLPFreq has not been tested on Windows-based systems yet.

📔 Table of Contents

  1. Requirements
  2. Features
  3. Installation
  4. Usage
  5. FAQ
  6. Acknowledgements

🌟 Requirements

Ensure you meet the following requirements before installation:

pip install -r requirements.txt

🎯 Features

  • Data Loading: Load text data from various sources, including raw input, files, and websites, with interactive prompts for user input.
  • Text Preprocessing: Tokenize and clean the text data, removing punctuation and converting words to lowercase.
  • Metrics Generation: Calculate and display key metrics, including character count with and without spaces, sentence count, word count, and paragraph count.
  • Morphological Analysis: Generate a detailed table of word morphology, including word rank, original form, lemmatized form, part-of-speech (POS) tag, percentage occurrence, and count.
  • Export Functionality: Optionally export the generated metrics, frequency tables, and visualizations to files.
  • Word Cloud Visualization: Create and display a word cloud visualization of the processed text data.
  • Word Frequency Chart: Generate and visualize the frequency of the top 20 words in the text.
  • Interactive Commands: Utilize command-line interface commands to perform actions such as displaying metrics, limiting results, searching for specific words, and generating visualizations.

🧰 Installation

Install from PyPI

You can install the nlpfreq package directly from PyPI using the following command:

pip install nlpfreq

Build from Source

  1. Clone the project:
git clone <repository_url>
cd NLPFreq
  1. Build the package:
python3 setup.py sdist bdist_wheel
  1. Install the package:
pip install dist/*.tar.gz
  1. Run the simulation:
nlpfreq

🧰 Usage

Show Metrics

Display metrics generated from the loaded data:

nlpfreq show-metrics --export
  • --export: Save metrics to files.

Limit Analysis

Limit the analysis to the top n highest occurring words:

nlpfreq limit --n <n>
  • --n: Specify the number of top words to display.

Search Word

Search for a specific word in the morphological data:

nlpfreq search-word --word <word>
  • --word: Specify the word to search for.

Generate Word Cloud

Generate and display a word cloud visualization:

nlpfreq generate-wordcloud --export
  • --export: Save the word cloud image.

Generate Word Frequency Plot

Generate and display a word frequency plot:

nlpfreq generate-wordfreq-plot --export
  • --export: Save the word frequency plot.

❓ FAQ

Q: What is NLPFreq?

A: NLPFreq is a tool designed for in-depth analysis of textual data, focusing on extracting meaning and linguistic insights. It provides features like word frequency, morphology, and metrics generation, enhancing data exploration and visualization.

Q: Why Develop NLPFreq as a Semantic Profiler?

A: NLPFreq was created for the ADSA subject in the fifth semester of college. The goal was to offer a versatile NLP tool, empowering users to analyze and profile text efficiently. The tool's features aim to deepen understanding and exploration of linguistic aspects within textual data.

Q: Why Did NLPFreq Evolve from a Word Frequency Counter?

A: Originally conceived as a word frequency counter, NLPFreq's development took a different direction. The decision to expand its capabilities was driven by the desire to create a more comprehensive tool for natural language processing. The project evolved to encompass semantic profiling, offering a richer set of features such as morphology analysis, metrics generation, and enhanced data visualization. This shift aimed to provide users with a more powerful and versatile solution for exploring and understanding textual data beyond simple word frequency analysis.

:ack: Acknowledgements

[Acknowledge contributors, libraries used, or any other relevant acknowledgments here.]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages