A parsing model for Brazilian Portuguese following the Universal Dependencies (UD) framework.
Note: This is a fork of the original Portparser.v2 Hugging Face Space by NILC-ICMC-USP. The original project includes a Streamlit web interface (
app.py) which is still present in this repository but is not required for using the parser programmatically.
# Using uv (recommended)
uv pip install -e .
# Or using pip
pip install -e .# For the Streamlit UI (not required for API usage)
uv pip install -e ".[ui]"
# For development/testing
uv pip install -e ".[dev]"The simplest way to parse text:
from portparser_v2 import parse
# Parse raw text (with automatic sentence segmentation)
conllu_output = parse("O Brasil é um país tropical. Ele fica na América do Sul.")
print(conllu_output)Quick parse function that returns CoNLL-U content directly.
from portparser_v2 import parse
# With sentence segmentation (default)
result = parse("O Brasil é um país tropical.")
# Without segmentation (assumes one sentence per line)
result = parse("Primeira frase.\nSegunda frase.", segment=False)Parameters:
text: Input text to parsesegment: Whether to automatically segment sentences (default:True)
Returns: Parsed CoNLL-U content as a string.
Full-featured parsing function with more control over the pipeline.
from portparser_v2 import parse_text
# Basic usage
output_file = parse_text("O Brasil é um país tropical.", segment_sentences=True)
# With custom output path
output_file = parse_text(
"O Brasil é um país tropical.",
output_path="./output.conllu",
work_dir="./temp",
segment_sentences=True
)Parameters:
text: Input text to parseoutput_path: Optional path for final CoNLL-U output (uses temp file ifNone)work_dir: Optional working directory for temp files (creates temp dir ifNone)model_path: Optional path to model weights (downloads from HuggingFace ifNone)segment_sentences: IfTrue, run sentence segmentation first; ifFalse, assume one sentence per line
Returns: Path to the final parsed CoNLL-U file.
parse_file(input_path, output_path=None, work_dir=None, model_path=None, segment_sentences=False) -> str
Parse text from a file.
from portparser_v2 import parse_file
# Parse a text file
output_file = parse_file(
"input.txt",
output_path="output.conllu",
segment_sentences=True
)Parameters: Same as parse_text, but with input_path instead of text.
Returns: Path to the final parsed CoNLL-U file.
The parser runs a 4-step pipeline:
- Sentence Segmentation (optional) - splits raw text into sentences
- Tokenization - converts sentences to CoNLL-U format with tokens
- Parsing/Prediction - runs the LatinPipe neural model to predict POS tags, morphological features, lemmas, and dependency relations
- Post-processing - applies cleanup and corrections to the output
The model weights are automatically downloaded from HuggingFace on first use:
- Repository:
lucelene/Portparser.v2-latinpipe-core
See LICENSE for details.