tcount

A fast, zero-network token counter for LLM workflows. Count tokens in files and directories using exact OpenAI tokenizers, Claude approximations, SentencePiece vocabularies, and generic estimation — all from a single CLI.

Features

Exact BPE tokenization — offline, no network calls. Supports GPT-5, GPT-4.1, GPT-4o, o-series, and legacy GPT-4/3.5.
Claude approximation calibrated for Anthropic models
SentencePiece exact tokenization for Llama and other open-source models (bring your own .model file)
Context window usage — see what percentage of a model's context you're consuming
Cost estimates with per-1M-token pricing via --cost
Provider filtering — compare models from a specific provider
Directory scanning with .gitignore support and binary file detection
JSON output for scripting and pipelines

Install

npm / pnpm / bun (macOS & Linux)

npm install -g @obedience-corp/tcount
# or
pnpm add -g @obedience-corp/tcount
# or
bun add -g @obedience-corp/tcount

The npm package downloads the official release binary for your platform (with checksum verification) on first install.

Homebrew (macOS & Linux)

brew install lancekrogers/tap/tcount

Go

go install github.com/lancekrogers/tcount/cmd/tcount@latest

From source

git clone https://github.com/lancekrogers/tcount.git
cd tcount
go build -o bin/tcount ./cmd/tcount

Binary releases

Pre-built binaries for macOS, Linux, and Windows are available on the releases page.

Quick Start

# Count tokens in a file
tcount myfile.txt

# Specific model
tcount --model gpt-5 prompt.md

# All methods with cost estimates
tcount --all --cost prompt.md

# Filter by provider
tcount --provider openai prompt.md

# Recursive directory count
tcount -r ./src

# JSON output for scripting
tcount --json document.md

Supported Models

OpenAI

Model	Encoding	Context
`gpt-5`, `gpt-5-mini`, `gpt-5-nano`	o200k_base	400K
`gpt-5.1`, `gpt-5.2`	o200k_base	400K
`gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`	o200k_base	1M
`gpt-4o`, `gpt-4o-mini`	o200k_base	128K
`o3`, `o3-mini`, `o4-mini`	o200k_base	200K
`gpt-4`, `gpt-4-turbo`	cl100k_base	8K–128K
`gpt-3.5-turbo`	cl100k_base	16K

Anthropic

Model	Method	Context
`claude-opus-4.6`, `claude-opus-4.5`	Approximation	200K
`claude-opus-4.1`, `claude-opus-4`	Approximation	200K
`claude-sonnet-4.6`, `claude-sonnet-4.5`, `claude-sonnet-4`	Approximation	200K
`claude-haiku-4.5`, `claude-haiku-3.5`, `claude-haiku-3`	Approximation	200K
`claude-opus-3` (deprecated)	Approximation	200K

Meta (Llama)

Model	Method	Context
`llama-4-scout`, `llama-4-maverick`	tiktoken approx / SentencePiece	128K
`llama-3.1-8b`, `llama-3.1-70b`, `llama-3.1-405b`	tiktoken approx / SentencePiece	128K

DeepSeek

Model	Method	Context
`deepseek-v2`, `deepseek-v3`, `deepseek-coder-v2`	tiktoken approx	128K

Alibaba (Qwen)

Model	Method	Context
`qwen-2.5-7b`, `qwen-2.5-14b`, `qwen-2.5-72b`	tiktoken approx	32K
`qwen-3-72b`	tiktoken approx	32K

Microsoft (Phi)

Model	Method	Context
`phi-3-mini`, `phi-3-small`, `phi-3-medium`	tiktoken approx	128K

Tokenization Methods

Method	Accuracy	When Used
tiktoken (o200k_base)	Exact	GPT-5.x, GPT-4.1, GPT-4o, o3, o4-mini
tiktoken (cl100k_base)	Exact	GPT-4, GPT-3.5
Claude approximation	Estimated	All Claude models (÷3.8 char ratio)
SentencePiece	Exact	Llama with `--vocab-file`
tiktoken approximation	Approximate	Llama, DeepSeek, Qwen, Phi (no vocab file)
Character-based	Approximate	Any (chars ÷ configurable ratio, default 4.0)
Word-based	Approximate	Any (words × configurable multiplier, default 1.33)
Whitespace split	Approximate	Any (raw word count as lower bound)

Usage

tcount [file|directory] [flags]

Flags

Flag	Short	Description
`--model`		Specific model tokenizer
`--models`	`-m`	Show encoding-to-model lookup table
`--provider`		Filter by provider: `openai`, `anthropic`, `meta`, `deepseek`, `alibaba`, `microsoft`, `all`
`--vocab-file`		Path to SentencePiece `.model` file for exact Llama tokenization
`--all`		Show all counting methods
`--json`		JSON output
`--cost`		Include cost estimates (per 1M tokens)
`--recursive`	`-r`	Recursively count files in a directory
`--directory`	`-d`	Alias for `--recursive`
`--chars-per-token`		Character/token ratio for approximation (default: 4.0)
`--words-per-token`		Words/token ratio for approximation (default: 0.75)
`--verbose`		Show additional details
`--no-color`		Disable color output

Examples

Single model

$ tcount --model gpt-5 document.md

Token Count Report for: document.md
═══════════════════════════════════════════════════════

Basic Statistics:
  Characters:     5451
  Words:          662
  Lines:          222

Token Counts by Method:
  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐
  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │
  ├─────────────────────────┼──────────┼────────────┼──────────────────┤
  │ GPT (gpt-5)             │ 1445     │ Exact      │ 0.7% of 200K     │
  └─────────────────────────┴──────────┴────────────┴──────────────────┘

All methods with costs

$ tcount --all --cost document.md

Token Count Report for: document.md
═══════════════════════════════════════════════════════

Basic Statistics:
  Characters:     5451
  Words:          662
  Lines:          222

Token Counts by Method:
  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐
  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │
  ├─────────────────────────┼──────────┼────────────┼──────────────────┤
  │ GPT (gpt-5)             │ 1445     │ Exact      │ 0.7% of 200K     │
  │ GPT (gpt-4o)            │ 1445     │ Exact      │ 1.1% of 128K     │
  │ Claude (approx)         │ 1434     │ Estimated  │ 0.7% of 200K     │
  │ Llama (llama-3.1-8b)    │ 1445     │ Exact      │ 1.1% of 128K     │
  │ Character-based (÷4.0)  │ 1362     │ Approx     │                  │
  │ Word-based (×1.33)      │ 882      │ Approx     │                  │
  │ Whitespace split        │ 662      │ Approx     │                  │
  └─────────────────────────┴──────────┴────────────┴──────────────────┘

Cost Estimates (Input):
  gpt-5:           $0.0018 ($1.25/1M tokens)
  gpt-4o:          $0.0036 ($2.50/1M tokens)
  claude-sonnet-4.6: $0.0043 ($3.00/1M tokens)
  claude-sonnet-4.5: $0.0043 ($3.00/1M tokens)

SentencePiece for exact Llama tokenization

# Download tokenizer.model from HuggingFace (requires auth):
# https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/original/tokenizer.model

tcount --model llama-3.1-8b --vocab-file /path/to/tokenizer.model document.md

Without --vocab-file, Llama models use a tiktoken-based approximation.

Directory scanning

$ tcount -r --verbose tokenizer/

Found 4 text files (skipped 0 binary, 0 ignored)
Token Count Report for: tokenizer/ (directory)
═══════════════════════════════════════════════════════

Basic Statistics:
  Files:          4
  Characters:     14929
  Words:          1906
  Lines:          612

Token Counts by Method:
  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐
  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │
  ├─────────────────────────┼──────────┼────────────┼──────────────────┤
  │ GPT (gpt-5)             │ 4206     │ Exact      │ 2.1% of 200K     │
  │ Claude (approx)         │ 3928     │ Estimated  │ 2.0% of 200K     │
  │ Character-based (÷4.0)  │ 3732     │ Approx     │                  │
  │ Word-based (×1.33)      │ 2541     │ Approx     │                  │
  │ Whitespace split        │ 1906     │ Approx     │                  │
  └─────────────────────────┴──────────┴────────────┴──────────────────┘

When scanning directories, tcount respects .gitignore rules, skips binary files and .git directories, and aggregates all text files into a combined count. Use --verbose to see file and skip statistics.

JSON output

$ tcount --json --model gpt-5 document.md
{
  "file_path": "document.md",
  "file_size": 5451,
  "characters": 5451,
  "words": 662,
  "lines": 222,
  "methods": [
    {
      "name": "tiktoken_gpt_5",
      "display_name": "GPT (gpt-5)",
      "tokens": 1445,
      "is_exact": true,
      "context_window": 200000
    }
  ]
}

# Extract a specific count
tcount --json myfile.txt | jq '.methods[] | select(.name == "tiktoken_gpt_5") | .tokens'

# Batch count all markdown files
for f in docs/*.md; do tcount --json "$f"; done | jq -s '.'

Library Usage

tcount can be used as a Go library in your own projects.

Installation

go get github.com/lancekrogers/tcount/tokenizer

Basic Token Counting

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/lancekrogers/tcount/tokenizer"
)

func main() {
    counter, err := tokenizer.NewCounter(tokenizer.CounterOptions{})
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()
    result, err := counter.Count(ctx, "Hello, world!", "gpt-4o", false)
    if err != nil {
        log.Fatal(err)
    }

    for _, m := range result.Methods {
        if m.IsExact {
            fmt.Printf("Tokens: %d (exact, %s)\n", m.Tokens, m.DisplayName)
        }
    }
}

File and Directory Counting

ctx := context.Background()

// Count tokens in a single file
result, err := counter.CountFile(ctx, "document.md", "gpt-4o", false)

// Count tokens across a directory (respects .gitignore, skips binaries)
result, err := counter.CountDirectory(ctx, "./src", "", true)
fmt.Printf("Files: %d, Tokens: %d\n", result.FileCount, result.Methods[0].Tokens)

Direct BPE Tokenizer Access

tok, err := tokenizer.NewBPETokenizer("gpt-4o")
if err != nil {
    log.Fatal(err)
}

count, _ := tok.CountTokens("Hello, world!")
fmt.Printf("Tokens: %d, Exact: %v\n", count, tok.IsExact())

Model Discovery

// Get metadata for a specific model
meta := tokenizer.GetModelMetadata("gpt-4o")
fmt.Printf("Encoding: %s, Context: %d\n", meta.Encoding, meta.ContextWindow)

// List all registered models
models := tokenizer.ListModels()

// List models by provider
openaiModels := tokenizer.ListModelsByProvider(tokenizer.ProviderOpenAI)

Cost Estimation

ctx := context.Background()
result, _ := counter.Count(ctx, text, "gpt-4o", false)
costs := tokenizer.CalculateCosts(result.Methods)
for _, c := range costs {
    fmt.Printf("%s: $%.4f\n", c.Model, c.Cost)
}

Development

Requires just for the build system.

just                       # List all recipes
just build                 # Build (with fmt + vet)
just test all              # Run all tests
just test unit             # Unit tests only
just test integration      # Integration tests only
just test coverage         # Coverage report
just test bench            # Benchmarks
just release all            # Cross-compile for all platforms

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
.justfiles		.justfiles
cmd/tcount		cmd/tcount
internal		internal
npm		npm
scripts		scripts
tests/integration		tests/integration
tokenizer		tokenizer
.env.example		.env.example
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

tcount

Features

Install

npm / pnpm / bun (macOS & Linux)

Homebrew (macOS & Linux)

Go

From source

Binary releases

Quick Start

Supported Models

OpenAI

Anthropic

Meta (Llama)

DeepSeek

Alibaba (Qwen)

Microsoft (Phi)

Tokenization Methods

Usage

Flags

Examples

Single model

All methods with costs

SentencePiece for exact Llama tokenization

Directory scanning

JSON output

Library Usage

Installation

Basic Token Counting

File and Directory Counting

Direct BPE Tokenizer Access

Model Discovery

Cost Estimation

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages