Simple Web Crawler Service

A concurrent web crawler service that searches for keywords in web pages and their linked pages.

Features

Concurrent web crawling using multiple threads
Case-insensitive keyword search
Supports both relative and absolute URLs
RESTful API endpoints
Real-time status tracking
Handles special characters in search terms

API Endpoints

1. Start a New Crawl

POST /crawl
Body: {"keyword": "your-search-term"}
Constraints:
- Keyword must be 4-32 characters long
Returns: Search object with ID and initial status

2. Check Crawl Status

GET /crawl/:id
Returns: Search object with current status and found URLs

Response Format

{
    "id": "unique-id",
    "urls": ["array-of-found-urls"],
    "status": "active|done"
}

Building and Running

Make sure you have Java and Maven installed
Build the project:
```
mvn clean package
```

Run the application:

Manually:

export BASE_URL=[base-url]
java -jar target/backend-test-1.0-SNAPSHOT.jar

Or without needing to build previously:

./run [base-url]

Docker Support

You can also run the application using Docker:

docker build . -t blur/backend
docker run -e BASE_URL=[base-url] -p 4567:4567

Technical Details

Built with Java and Spark Framework
Uses concurrent data structures for thread safety
Implements smart URL normalization
Filters out non-HTML resources (images, videos, etc.)
Proper error handling and logging

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
blur-test-collection		blur-test-collection
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pom.xml		pom.xml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Web Crawler Service

Features

API Endpoints

1. Start a New Crawl

2. Check Crawl Status

Response Format

Building and Running

Docker Support

Technical Details

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Simple Web Crawler Service

Features

API Endpoints

1. Start a New Crawl

2. Check Crawl Status

Response Format

Building and Running

Docker Support

Technical Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages