Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,7 @@ echo $text; // consist
## Tests
A verification list of 29,000 words and their expected stems can be run (after
```composer install``` via ```phpunit```).

## Contributions (Paras Lehana)

* *External file support for protected words:* Now you can add protected words to this stemmer. Protected words won't be stemmed. For example, I have added 'training' as protected word so that it doesn't get stemmed to 'train'. Add words in newline in file src/protwords.txt (more instructions in protwords.txt file).
1 change: 1 addition & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
theme: jekyll-theme-minimal
8 changes: 8 additions & 0 deletions src/Porter2.php
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,14 @@ protected static function step1b($word) {
'exceed',
'succeed',
);

$line_arr = file(__DIR__.'/protwords.txt',FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES);

foreach ($line_arr as $line){
if(substr(trim($line),0,1)=='#') continue;
array_push($exceptions,(string)$line);
}

if (in_array($word, $exceptions)) {
return $word;
}
Expand Down
10 changes: 10 additions & 0 deletions src/protwords.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Please add protected words here in new line.
# Lines starting with hash would be ignored so you comments using this.
# I usually add current date before adding a protected word.
# Protected words: The words you'll be putting here would be ignored by the stemmer. That is, for a keyword like 'training' put here, it would be returned as 'training' by the stemmer and not 'train'.
# Blank lines would also be ignored.
# Demo: Uncomment 'training' that I have added on Jan 10, 2019 to protect it. Likewise, you can add your words in each line.

# Added on Jan 10, 2018

training