toksum Documentation

toksum is a comprehensive Python library for counting tokens across 200+ Large Language Models from 25+ providers. It provides precise token counting for OpenAI models using tiktoken and intelligent approximations for all other providers.

Features

Comprehensive Model Support: 200+ models from 25+ providers
Precise Counting: Exact token counts for OpenAI models using tiktoken
Intelligent Approximations: Calibrated algorithms for other providers
Cost Estimation: Built-in pricing for cost calculations
Chat Format Support: Token counting for conversation messages
Case-Insensitive: Flexible model name matching
CLI Interface: Command-line tool for quick token counting

Quick Start

Installation

pip install toksum

For OpenAI model support (recommended):

pip install toksum[openai]

Basic Usage

from toksum import count_tokens, TokenCounter

# Quick token counting
tokens = count_tokens("Hello, world!", "gpt-4")
print(f"Tokens: {tokens}")

# Using TokenCounter for multiple operations
counter = TokenCounter("gpt-4")
tokens = counter.count("Hello, world!")

# Chat message format
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]
total_tokens = counter.count_messages(messages)

Supported Providers

toksum supports models from major providers:

OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models (25+ models)
Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2 (12+ models)
Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
Meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)
Mistral: Mistral 7B, Mixtral, Large variants (10+ models)
Cohere: Command, Command-R, Command-R+ (8+ models)
xAI: Grok models (4+ models)
Chinese Providers: Alibaba Qwen, Baidu ERNIE, Huawei PanGu, Tsinghua ChatGLM
Code Models: DeepSeek Coder, Replit Code, BigCode StarCoder
Open Source: EleutherAI, Stability AI, TII Falcon, RWKV
Enterprise: Databricks DBRX, Microsoft Phi, Amazon Titan, IBM Granite

API Reference

Core Functions

toksum.count_tokens(text: str, model: str) → int[source]

Convenience function to count tokens for a given text and model.

This is a simplified interface that creates a TokenCounter instance and performs token counting in a single function call. Ideal for one-off token counting operations without needing to manage TokenCounter instances.

Parameters:

text (str) – The text to count tokens for. Must be a string.
model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’). Model names are case-insensitive.

Returns:

The number of tokens in the text.

Return type:

int

Raises:

UnsupportedModelError – If the specified model is not supported.
TokenizationError – If tokenization fails or input is invalid.

Examples

Basic usage:

from toksum import count_tokens

# OpenAI model
tokens = count_tokens("Hello, world!", "gpt-4")
print(f"GPT-4 tokens: {tokens}")

# Anthropic model
tokens = count_tokens("Hello, world!", "claude-3-opus")
print(f"Claude tokens: {tokens}")

# Case-insensitive model names
tokens = count_tokens("Hello, world!", "GPT-4")  # Same as "gpt-4"

Comparing models:

text = "This is a sample text for comparison."
models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "gemini-pro"]

for model in models:
    tokens = count_tokens(text, model)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    tokens = count_tokens("Hello!", "unsupported-model")
except UnsupportedModelError as e:
    print(f"Model not supported: {e}")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")

Performance:

This function creates a new TokenCounter instance for each call. For multiple operations with the same model, consider using TokenCounter directly for better performance:

# Less efficient for multiple calls
for text in texts:
    tokens = count_tokens(text, "gpt-4")

# More efficient for multiple calls
counter = TokenCounter("gpt-4")
for text in texts:
    tokens = counter.count(text)

Note

This function is equivalent to:

counter = TokenCounter(model)
return counter.count(text)

toksum.get_supported_models() → Dict[str, List[str]][source]

Get a comprehensive dictionary of supported models organized by provider.

Returns all 200+ supported models grouped by their respective providers, making it easy to discover available models and understand the scope of toksum’s capabilities.

Returns:

Dictionary with provider names as keys and lists

of model names as values. Providers include:

openai: GPT-4, GPT-3.5, GPT-4o, O1, embeddings (25+ models)
anthropic: Claude 3/3.5, Claude 2, Instant (12+ models)
google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)
mistral: Mistral 7B, Mixtral, Large variants (10+ models)
cohere: Command, Command-R, Command-R+ (8+ models)
xai: Grok 1/1.5/2 and beta models (4+ models)
alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)
baidu: ERNIE 3.0/3.5/4.0 variants (8+ models)
huawei: PanGu Alpha and Coder models (5+ models)
yandex: YaLM and YaGPT models (4+ models)
deepseek: Coder, VL, and LLM models (8+ models)
tsinghua: ChatGLM and GLM models (5+ models)
databricks: DBRX and Dolly models (6+ models)
voyage: Voyage embedding models (6+ models)
And 10+ more providers

Return type:

Dict[str, List[str]]

Examples

Basic usage:

from toksum import get_supported_models

models = get_supported_models()

# List all providers
print("Supported providers:")
for provider in models.keys():
    print(f"  {provider}")

Explore specific providers:

models = get_supported_models()

# OpenAI models
print("OpenAI models:")
for model in models["openai"]:
    print(f"  {model}")

# Anthropic models
print("\nAnthropic models:")
for model in models["anthropic"]:
    print(f"  {model}")

Count models by provider:

models = get_supported_models()

print("Model counts by provider:")
total_models = 0
for provider, model_list in models.items():
    count = len(model_list)
    total_models += count
    print(f"  {provider}: {count} models")

print(f"\nTotal: {total_models} models")

Find models by pattern:

models = get_supported_models()

# Find all GPT-4 variants
gpt4_models = []
for model in models["openai"]:
    if "gpt-4" in model:
        gpt4_models.append(model)

print("GPT-4 variants:")
for model in gpt4_models:
    print(f"  {model}")

Validate model support:

models = get_supported_models()

def is_model_supported(model_name):
    model_lower = model_name.lower()
    for provider_models in models.values():
        if model_lower in [m.lower() for m in provider_models]:
            return True
    return False

# Check if models are supported
test_models = ["gpt-4", "claude-3-opus", "unknown-model"]
for model in test_models:
    supported = is_model_supported(model)
    print(f"{model}: {'✓' if supported else '✗'}")

Integration with TokenCounter:

from toksum import TokenCounter, get_supported_models

models = get_supported_models()
text = "Test tokenization across providers."

# Test a few models from each major provider
test_models = {
    "openai": models["openai"][0],      # First OpenAI model
    "anthropic": models["anthropic"][0], # First Anthropic model
    "google": models["google"][0],       # First Google model
    "meta": models["meta"][0]            # First Meta model
}

for provider, model in test_models.items():
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{provider} ({model}): {tokens} tokens")

Provider Categories:

The returned dictionary includes models from these categories:

Major Cloud Providers: - OpenAI, Anthropic, Google, Microsoft, Amazon

AI-First Companies: - Mistral, Cohere, xAI, Perplexity, AI21

Regional/Language-Specific: - Alibaba (Chinese), Baidu (Chinese), Huawei (Chinese) - Yandex (Russian), Tsinghua (Chinese)

Open Source/Research: - EleutherAI, Stability AI, TII, RWKV, Community models

Enterprise/Specialized: - Databricks, Voyage, DeepSeek, BigCode, Replit - Nvidia, IBM, Salesforce

Note

The model lists are comprehensive but may not include every variant or the very latest models. The library is regularly updated to include new models as they become available.

See also

TokenCounter: For creating token counters with specific models
count_tokens(): For quick token counting with model validation
UnsupportedModelError: Exception raised for unsupported models

toksum.estimate_cost(token_count: int, model: str, input_tokens: bool = True, currency: str = 'USD') → float[source]

Estimate the cost for a given number of tokens and model.

Calculates estimated costs based on current pricing for supported models. Supports both input and output token pricing, as many models have different rates for input vs. output tokens. Provides costs in USD or INR currency.

Parameters:

token_count (int) – Number of tokens to estimate cost for. Must be non-negative.
model (str) – Model name (e.g., “gpt-4”, “gpt-4o”, “claude-3-opus-20240229”). Model names are case-insensitive.
input_tokens (bool, optional) – True for input token pricing, False for output token pricing. Defaults to True. Many models charge more for output tokens than input tokens.
currency (str, optional) – Currency code (“USD” or “INR”). Defaults to “USD”. Uses current conversion rate for INR.

Returns:

Estimated cost in the specified currency. Returns 0.0 if the model: is not in the pricing database or if pricing is not available.

Return type:

float

Pricing Coverage:

The function includes pricing for major models:

OpenAI Models: - GPT-4: $0.03/$0.06 per 1K tokens (input/output) - GPT-4 Turbo: $0.01/$0.03 per 1K tokens - GPT-4o: $0.005/$0.015 per 1K tokens - GPT-4o Mini: $0.00015/$0.0006 per 1K tokens - GPT-3.5 Turbo: $0.001/$0.002 per 1K tokens

Anthropic Models: - Claude-3 Opus: $0.015/$0.075 per 1K tokens - Claude-3 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3 Haiku: $0.00025/$0.00125 per 1K tokens - Claude-3.5 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3.5 Haiku: $0.001/$0.005 per 1K tokens

Databricks Models: - DBRX Instruct: $0.001/$0.002 per 1K tokens - Dolly models: $0.001/$0.002 per 1K tokens

Voyage AI Models: - All Voyage models: $0.0001/$0.0001 per 1K tokens

Examples

Basic cost estimation:

from toksum import count_tokens, estimate_cost

text = "This is a sample text for cost estimation."
model = "gpt-4"

# Count tokens and estimate cost
tokens = count_tokens(text, model)
input_cost = estimate_cost(tokens, model, input_tokens=True)
output_cost = estimate_cost(tokens, model, input_tokens=False)

print(f"Text: '{text}'")
print(f"Tokens: {tokens}")
print(f"Input cost: ${input_cost:.4f}")
print(f"Output cost: ${output_cost:.4f}")

Compare costs across models:

text = "Compare costs across different models." * 100  # Longer text
models = ["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-opus", "claude-3-haiku"]

print(f"Text length: {len(text)} characters")
print("\nCost comparison:")

for model in models:
    try:
        tokens = count_tokens(text, model)
        input_cost = estimate_cost(tokens, model, input_tokens=True)
        output_cost = estimate_cost(tokens, model, input_tokens=False)

        print(f"{model}:")
        print(f"  Tokens: {tokens}")
        print(f"  Input: ${input_cost:.4f}")
        print(f"  Output: ${output_cost:.4f}")
    except Exception as e:
        print(f"{model}: Error - {e}")

Currency conversion:

tokens = 1000
model = "gpt-4"

# USD pricing
cost_usd = estimate_cost(tokens, model, currency="USD")
print(f"Cost in USD: ${cost_usd:.4f}")

# INR pricing
cost_inr = estimate_cost(tokens, model, currency="INR")
print(f"Cost in INR: ₹{cost_inr:.2f}")

Batch cost estimation:

texts = [
    "Short text",
    "Medium length text with more content",
    "Much longer text that will cost more to process" * 10
]

model = "gpt-4o"
total_cost = 0

print("Individual text costs:")
for i, text in enumerate(texts, 1):
    tokens = count_tokens(text, model)
    cost = estimate_cost(tokens, model)
    total_cost += cost
    print(f"Text {i}: {tokens} tokens, ${cost:.4f}")

print(f"\nTotal estimated cost: ${total_cost:.4f}")

Chat conversation costing:

from toksum import TokenCounter

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing."},
    {"role": "assistant", "content": "Quantum computing is a revolutionary..."}
]

counter = TokenCounter("gpt-4")
total_tokens = counter.count_messages(messages)

# Estimate costs for the conversation
input_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=True)
output_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=False)

print(f"Conversation tokens: {total_tokens}")
print(f"If all input: ${input_cost:.4f}")
print(f"If all output: ${output_cost:.4f}")

Currency Conversion:

USD to INR rate: 83.0 (as of July 2025)
Rate updates: The conversion rate is periodically updated
Precision: INR costs are calculated from USD base prices

Limitations:

Pricing accuracy: Based on publicly available pricing, may not reflect current rates or enterprise discounts
Model coverage: Only includes models with known pricing
Rate changes: Pricing may change without notice
Approximation: For non-OpenAI models, token counts are approximated

Note

This function provides cost estimates for planning and budgeting purposes. Actual costs may vary based on current pricing, volume discounts, and exact tokenization. Always verify current pricing with the model provider for production applications.

TokenCounter Class

class toksum.TokenCounter(model: str)[source]

Bases: object

A comprehensive token counter for various Large Language Model (LLM) providers.

This class provides functionality to count tokens for 200+ different LLMs from 25+ providers, including OpenAI, Anthropic, Google, Meta, Mistral, and many others. It supports both individual text strings and lists of messages (for chat-like interactions).

The token counting is precise for OpenAI models using the official tiktoken library, and provides reasonable approximations for other providers using intelligent algorithms calibrated for each provider’s tokenization characteristics.

model

The model name (converted to lowercase)

Type:: str

provider

The detected provider name

Type:: str

tokenizer

The tokenizer instance (tiktoken for OpenAI, None for others)

Type:: Optional[Any]

Supported Providers:

OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models, embeddings (25+ models)
Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2, Instant (12+ models)
Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
Meta: LLaMA 2/3/3.1/3.2/3.3 in various sizes (15+ models)
Mistral: Mistral 7B, Mixtral, Mistral Large variants (10+ models)
Cohere: Command, Command-R, Command-R+ (8+ models)
xAI: Grok 1/1.5/2 and beta models (4+ models)
Alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)
Baidu: ERNIE 3.0/3.5/4.0 and variants (8+ models)
Huawei: PanGu Alpha and Coder models (5+ models)
Yandex: YaLM and YaGPT models (4+ models)
DeepSeek: Coder, VL, and LLM models (8+ models)
Tsinghua: ChatGLM and GLM models (5+ models)
And 15+ more providers with specialized models

Examples

Basic usage:

# Count tokens for a single text string
counter = TokenCounter("gpt-4")
token_count = counter.count("This is a test string.")
print(f"Token count: {token_count}")

Chat message format:

# Count tokens for a list of messages (chat format)
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "How can I help you?"},
]
token_count = counter.count_messages(messages)
print(f"Token count (messages): {token_count}")

Different providers:

# Compare token counts across providers
models = ["gpt-4", "claude-3-opus", "gemini-pro", "llama-3-70b"]
text = "Compare tokenization across different models."

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Cost estimation:

from toksum.core import estimate_cost

counter = TokenCounter("gpt-4")
tokens = counter.count("Your text here")
cost = estimate_cost(tokens, "gpt-4", input_tokens=True)
print(f"Estimated cost: ${cost:.4f}")

Tokenization Accuracy:

OpenAI models: Exact token counts using official tiktoken encodings
Other providers: Approximations with typical accuracy of ±10-20%
Approximation factors: Calibrated per provider based on tokenization patterns
Language optimization: Adjusted for Chinese, Russian, and other languages

Note

For production applications requiring exact token counts, use OpenAI models. For other providers, approximations are suitable for cost estimation, rate limit planning, and comparative analysis.

Raises:

UnsupportedModelError – If the specified model is not supported
TokenizationError – If tokenization fails or required dependencies are missing

__init__(model: str)[source]

Initialize the TokenCounter with a specific model.

Sets up the appropriate tokenizer based on the model’s provider. For OpenAI models, initializes the tiktoken tokenizer with the correct encoding. For other providers, sets up approximation-based token counting.

Parameters:

model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’, ‘gemini-pro’). Model names are case-insensitive and will be converted to lowercase.

Raises:

UnsupportedModelError – If the model is not supported. The exception includes a list of all supported models for reference.
TokenizationError – If required dependencies are missing (e.g., tiktoken for OpenAI models) or if tokenizer initialization fails.

Examples

# OpenAI model (requires tiktoken)
counter = TokenCounter("gpt-4")

# Anthropic model (uses approximation)
counter = TokenCounter("claude-3-opus-20240229")

# Case-insensitive model names
counter = TokenCounter("GPT-4")  # Same as "gpt-4"

# Google model
counter = TokenCounter("gemini-pro")

# Meta model
counter = TokenCounter("llama-3-70b")

Note

The constructor automatically detects the provider based on the model name and sets up the appropriate tokenization method. OpenAI models use precise tiktoken-based counting, while other providers use calibrated approximations.

tokenizer: Any | None

count(text: str) → int[source]

Count tokens in the given text.

Performs token counting using the appropriate method for the model’s provider. For OpenAI models, uses precise tiktoken-based counting. For other providers, uses intelligent approximation algorithms calibrated for each provider.

Parameters:: text (str) – The text to count tokens for. Must be a string.
Returns:: The number of tokens in the text. Returns 0 for empty strings.
Return type:: int
Raises:: TokenizationError – If tokenization fails, input is invalid, or required dependencies are missing. Includes detailed error context with model name and text preview.

Input Validation:

The method performs comprehensive input validation:

None check: Rejects None input with clear error message
Type check: Ensures input is a string, not int/float/list/dict/etc.
Empty string: Returns 0 for empty strings (valid case)

Tokenization Methods:

OpenAI models: Uses tiktoken.encode() for exact token counts
Other providers: Uses _approximate_tokens() with provider-specific calibration

Provider-Specific Accuracy:

OpenAI: 100% accurate (official tokenizer)
Anthropic: ~90-95% accurate (well-calibrated approximation)
Google: ~85-90% accurate (Gemini-optimized approximation)
Meta: ~85-90% accurate (LLaMA-optimized approximation)
Chinese models: ~80-90% accurate (character-optimized for Chinese)
Code models: ~85-95% accurate (code-pattern optimized)
Other providers: ~80-90% accurate (general approximation)

Examples

Basic usage:

counter = TokenCounter("gpt-4")

# Simple text
tokens = counter.count("Hello, world!")
print(f"Tokens: {tokens}")  # Exact count for OpenAI

# Empty string
tokens = counter.count("")
print(f"Tokens: {tokens}")  # Always returns 0

# Longer text
text = "This is a longer text that will be tokenized."
tokens = counter.count(text)
print(f"Tokens: {tokens}")

Comparing providers:

text = "Compare tokenization across different models."
models = ["gpt-4", "claude-3-opus", "gemini-pro"]

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    counter = TokenCounter("gpt-4")
    tokens = counter.count("Valid text")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")

Performance:

OpenAI models: Fast (native tiktoken performance)
Other providers: Very fast (lightweight approximation algorithms)
Typical speed: 10,000+ texts per second for approximation methods

Note

For production applications requiring exact token counts, use OpenAI models. For cost estimation, rate limiting, and comparative analysis, approximations provide sufficient accuracy with much better performance.

count_messages(messages: List[Dict[str, str]]) → int[source]

Count tokens for a list of messages in chat format.

Processes a list of message dictionaries (typical chat/conversation format) and returns the total token count including any formatting overhead. This method is essential for chat-based applications and conversation analysis.

Parameters:

messages (List[Dict[str, str]]) –

List of message dictionaries. Each message must contain ‘role’ and ‘content’ keys.

Expected format:

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

Returns:

Total token count for all messages including formatting overhead.

Return type:

int

Raises:

TokenizationError – If messages format is invalid, contains non-string content, or if tokenization of individual messages fails. Includes detailed error context with message index and content preview.

Message Format Validation:

The method performs comprehensive validation:

Input type: Must be a list, not string/dict/int/etc.
Message structure: Each message must be a dictionary
Required keys: Each message must have ‘role’ and ‘content’ keys
Content type: Message content must be a string, not None/int/list/etc.
Role type: Message role must be a string if present

Formatting Overhead:

Different providers handle message formatting differently:

OpenAI: Minimal overhead (~1 token per role)
Anthropic: No additional formatting overhead
Other providers: No additional overhead assumed

Common Message Roles:

system: System instructions or context
user: User input or questions
assistant: AI assistant responses
function: Function call results (some providers)

Examples

Basic chat conversation:

counter = TokenCounter("gpt-4")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
]

total_tokens = counter.count_messages(messages)
print(f"Total conversation tokens: {total_tokens}")

Comparing individual vs. message counting:

counter = TokenCounter("gpt-4")

# Count individual messages
individual_total = 0
for msg in messages:
    tokens = counter.count(msg["content"])
    individual_total += tokens
    print(f"{msg['role']}: {tokens} tokens")

# Count as message format (includes formatting overhead)
message_total = counter.count_messages(messages)

print(f"Individual sum: {individual_total}")
print(f"Message format: {message_total}")
print(f"Formatting overhead: {message_total - individual_total}")

Error handling:

try:
    counter = TokenCounter("gpt-4")

    # Invalid format - missing content
    invalid_messages = [{"role": "user"}]
    tokens = counter.count_messages(invalid_messages)

except TokenizationError as e:
    print(f"Message format error: {e}")

Multi-provider comparison:

messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
]

models = ["gpt-4", "claude-3-opus", "gemini-pro"]
for model in models:
    counter = TokenCounter(model)
    tokens = counter.count_messages(messages)
    print(f"{model}: {tokens} tokens")

Performance:

Speed: Processes thousands of message lists per second
Memory: Minimal additional memory overhead
Scalability: Handles conversations with hundreds of messages

Use Cases:

Chat applications: Calculate conversation costs
API rate limiting: Plan request sizes for chat endpoints
Conversation analysis: Analyze dialogue token patterns
Cost estimation: Budget for chat-based AI applications
Content moderation: Assess conversation length and complexity

Note

This method is specifically designed for chat/conversation formats. For simple text token counting, use the count() method instead.

Command Line Interface

toksum provides a command-line interface for quick token counting:

# Basic usage
toksum "Hello, world!" gpt-4

# From file
toksum --file document.txt claude-3-opus

# With cost estimation
toksum --cost "Your text" gpt-4

# List supported models
toksum --list-models

toksum Documentation

Features

Quick Start

Installation

Basic Usage

Supported Providers

API Reference

Core Functions

TokenCounter Class

Command Line Interface

Indices and tables