toksum Documentation
toksum is a comprehensive Python library for counting tokens across 200+ Large Language Models from 25+ providers. It provides precise token counting for OpenAI models using tiktoken and intelligent approximations for all other providers.
Features
Comprehensive Model Support: 200+ models from 25+ providers
Precise Counting: Exact token counts for OpenAI models using tiktoken
Intelligent Approximations: Calibrated algorithms for other providers
Cost Estimation: Built-in pricing for cost calculations
Chat Format Support: Token counting for conversation messages
Case-Insensitive: Flexible model name matching
CLI Interface: Command-line tool for quick token counting
Quick Start
Installation
pip install toksum
For OpenAI model support (recommended):
pip install toksum[openai]
Basic Usage
from toksum import count_tokens, TokenCounter
# Quick token counting
tokens = count_tokens("Hello, world!", "gpt-4")
print(f"Tokens: {tokens}")
# Using TokenCounter for multiple operations
counter = TokenCounter("gpt-4")
tokens = counter.count("Hello, world!")
# Chat message format
messages = [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"}
]
total_tokens = counter.count_messages(messages)
Supported Providers
toksum supports models from major providers:
OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models (25+ models)
Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2 (12+ models)
Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
Meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)
Mistral: Mistral 7B, Mixtral, Large variants (10+ models)
Cohere: Command, Command-R, Command-R+ (8+ models)
xAI: Grok models (4+ models)
Chinese Providers: Alibaba Qwen, Baidu ERNIE, Huawei PanGu, Tsinghua ChatGLM
Code Models: DeepSeek Coder, Replit Code, BigCode StarCoder
Open Source: EleutherAI, Stability AI, TII Falcon, RWKV
Enterprise: Databricks DBRX, Microsoft Phi, Amazon Titan, IBM Granite
API Reference
Core Functions
- toksum.count_tokens(text: str, model: str) int[source]
Convenience function to count tokens for a given text and model.
This is a simplified interface that creates a TokenCounter instance and performs token counting in a single function call. Ideal for one-off token counting operations without needing to manage TokenCounter instances.
- Parameters:
- Returns:
The number of tokens in the text.
- Return type:
- Raises:
UnsupportedModelError – If the specified model is not supported.
TokenizationError – If tokenization fails or input is invalid.
Examples
Basic usage:
from toksum import count_tokens # OpenAI model tokens = count_tokens("Hello, world!", "gpt-4") print(f"GPT-4 tokens: {tokens}") # Anthropic model tokens = count_tokens("Hello, world!", "claude-3-opus") print(f"Claude tokens: {tokens}") # Case-insensitive model names tokens = count_tokens("Hello, world!", "GPT-4") # Same as "gpt-4"
Comparing models:
text = "This is a sample text for comparison." models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "gemini-pro"] for model in models: tokens = count_tokens(text, model) print(f"{model}: {tokens} tokens")
Error handling:
try: tokens = count_tokens("Hello!", "unsupported-model") except UnsupportedModelError as e: print(f"Model not supported: {e}") except TokenizationError as e: print(f"Tokenization failed: {e}")
- Performance:
This function creates a new TokenCounter instance for each call. For multiple operations with the same model, consider using TokenCounter directly for better performance:
# Less efficient for multiple calls for text in texts: tokens = count_tokens(text, "gpt-4") # More efficient for multiple calls counter = TokenCounter("gpt-4") for text in texts: tokens = counter.count(text)
Note
This function is equivalent to:
counter = TokenCounter(model) return counter.count(text)
- toksum.get_supported_models() Dict[str, List[str]][source]
Get a comprehensive dictionary of supported models organized by provider.
Returns all 200+ supported models grouped by their respective providers, making it easy to discover available models and understand the scope of toksum’s capabilities.
- Returns:
- Dictionary with provider names as keys and lists
of model names as values. Providers include:
openai: GPT-4, GPT-3.5, GPT-4o, O1, embeddings (25+ models)
anthropic: Claude 3/3.5, Claude 2, Instant (12+ models)
google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)
mistral: Mistral 7B, Mixtral, Large variants (10+ models)
cohere: Command, Command-R, Command-R+ (8+ models)
xai: Grok 1/1.5/2 and beta models (4+ models)
alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)
baidu: ERNIE 3.0/3.5/4.0 variants (8+ models)
huawei: PanGu Alpha and Coder models (5+ models)
yandex: YaLM and YaGPT models (4+ models)
deepseek: Coder, VL, and LLM models (8+ models)
tsinghua: ChatGLM and GLM models (5+ models)
databricks: DBRX and Dolly models (6+ models)
voyage: Voyage embedding models (6+ models)
And 10+ more providers
- Return type:
Examples
Basic usage:
from toksum import get_supported_models models = get_supported_models() # List all providers print("Supported providers:") for provider in models.keys(): print(f" {provider}")
Explore specific providers:
models = get_supported_models() # OpenAI models print("OpenAI models:") for model in models["openai"]: print(f" {model}") # Anthropic models print("\nAnthropic models:") for model in models["anthropic"]: print(f" {model}")
Count models by provider:
models = get_supported_models() print("Model counts by provider:") total_models = 0 for provider, model_list in models.items(): count = len(model_list) total_models += count print(f" {provider}: {count} models") print(f"\nTotal: {total_models} models")
Find models by pattern:
models = get_supported_models() # Find all GPT-4 variants gpt4_models = [] for model in models["openai"]: if "gpt-4" in model: gpt4_models.append(model) print("GPT-4 variants:") for model in gpt4_models: print(f" {model}")
Validate model support:
models = get_supported_models() def is_model_supported(model_name): model_lower = model_name.lower() for provider_models in models.values(): if model_lower in [m.lower() for m in provider_models]: return True return False # Check if models are supported test_models = ["gpt-4", "claude-3-opus", "unknown-model"] for model in test_models: supported = is_model_supported(model) print(f"{model}: {'✓' if supported else '✗'}")
Integration with TokenCounter:
from toksum import TokenCounter, get_supported_models models = get_supported_models() text = "Test tokenization across providers." # Test a few models from each major provider test_models = { "openai": models["openai"][0], # First OpenAI model "anthropic": models["anthropic"][0], # First Anthropic model "google": models["google"][0], # First Google model "meta": models["meta"][0] # First Meta model } for provider, model in test_models.items(): counter = TokenCounter(model) tokens = counter.count(text) print(f"{provider} ({model}): {tokens} tokens")
- Provider Categories:
The returned dictionary includes models from these categories:
Major Cloud Providers: - OpenAI, Anthropic, Google, Microsoft, Amazon
AI-First Companies: - Mistral, Cohere, xAI, Perplexity, AI21
Regional/Language-Specific: - Alibaba (Chinese), Baidu (Chinese), Huawei (Chinese) - Yandex (Russian), Tsinghua (Chinese)
Open Source/Research: - EleutherAI, Stability AI, TII, RWKV, Community models
Enterprise/Specialized: - Databricks, Voyage, DeepSeek, BigCode, Replit - Nvidia, IBM, Salesforce
Note
The model lists are comprehensive but may not include every variant or the very latest models. The library is regularly updated to include new models as they become available.
See also
TokenCounter: For creating token counters with specific modelscount_tokens(): For quick token counting with model validationUnsupportedModelError: Exception raised for unsupported models
- toksum.estimate_cost(token_count: int, model: str, input_tokens: bool = True, currency: str = 'USD') float[source]
Estimate the cost for a given number of tokens and model.
Calculates estimated costs based on current pricing for supported models. Supports both input and output token pricing, as many models have different rates for input vs. output tokens. Provides costs in USD or INR currency.
- Parameters:
token_count (int) – Number of tokens to estimate cost for. Must be non-negative.
model (str) – Model name (e.g., “gpt-4”, “gpt-4o”, “claude-3-opus-20240229”). Model names are case-insensitive.
input_tokens (bool, optional) – True for input token pricing, False for output token pricing. Defaults to True. Many models charge more for output tokens than input tokens.
currency (str, optional) – Currency code (“USD” or “INR”). Defaults to “USD”. Uses current conversion rate for INR.
- Returns:
- Estimated cost in the specified currency. Returns 0.0 if the model
is not in the pricing database or if pricing is not available.
- Return type:
- Pricing Coverage:
The function includes pricing for major models:
OpenAI Models: - GPT-4: $0.03/$0.06 per 1K tokens (input/output) - GPT-4 Turbo: $0.01/$0.03 per 1K tokens - GPT-4o: $0.005/$0.015 per 1K tokens - GPT-4o Mini: $0.00015/$0.0006 per 1K tokens - GPT-3.5 Turbo: $0.001/$0.002 per 1K tokens
Anthropic Models: - Claude-3 Opus: $0.015/$0.075 per 1K tokens - Claude-3 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3 Haiku: $0.00025/$0.00125 per 1K tokens - Claude-3.5 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3.5 Haiku: $0.001/$0.005 per 1K tokens
Databricks Models: - DBRX Instruct: $0.001/$0.002 per 1K tokens - Dolly models: $0.001/$0.002 per 1K tokens
Voyage AI Models: - All Voyage models: $0.0001/$0.0001 per 1K tokens
Examples
Basic cost estimation:
from toksum import count_tokens, estimate_cost text = "This is a sample text for cost estimation." model = "gpt-4" # Count tokens and estimate cost tokens = count_tokens(text, model) input_cost = estimate_cost(tokens, model, input_tokens=True) output_cost = estimate_cost(tokens, model, input_tokens=False) print(f"Text: '{text}'") print(f"Tokens: {tokens}") print(f"Input cost: ${input_cost:.4f}") print(f"Output cost: ${output_cost:.4f}")
Compare costs across models:
text = "Compare costs across different models." * 100 # Longer text models = ["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-opus", "claude-3-haiku"] print(f"Text length: {len(text)} characters") print("\nCost comparison:") for model in models: try: tokens = count_tokens(text, model) input_cost = estimate_cost(tokens, model, input_tokens=True) output_cost = estimate_cost(tokens, model, input_tokens=False) print(f"{model}:") print(f" Tokens: {tokens}") print(f" Input: ${input_cost:.4f}") print(f" Output: ${output_cost:.4f}") except Exception as e: print(f"{model}: Error - {e}")
Currency conversion:
tokens = 1000 model = "gpt-4" # USD pricing cost_usd = estimate_cost(tokens, model, currency="USD") print(f"Cost in USD: ${cost_usd:.4f}") # INR pricing cost_inr = estimate_cost(tokens, model, currency="INR") print(f"Cost in INR: ₹{cost_inr:.2f}")
Batch cost estimation:
texts = [ "Short text", "Medium length text with more content", "Much longer text that will cost more to process" * 10 ] model = "gpt-4o" total_cost = 0 print("Individual text costs:") for i, text in enumerate(texts, 1): tokens = count_tokens(text, model) cost = estimate_cost(tokens, model) total_cost += cost print(f"Text {i}: {tokens} tokens, ${cost:.4f}") print(f"\nTotal estimated cost: ${total_cost:.4f}")
Chat conversation costing:
from toksum import TokenCounter messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing."}, {"role": "assistant", "content": "Quantum computing is a revolutionary..."} ] counter = TokenCounter("gpt-4") total_tokens = counter.count_messages(messages) # Estimate costs for the conversation input_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=True) output_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=False) print(f"Conversation tokens: {total_tokens}") print(f"If all input: ${input_cost:.4f}") print(f"If all output: ${output_cost:.4f}")
- Currency Conversion:
USD to INR rate: 83.0 (as of July 2025)
Rate updates: The conversion rate is periodically updated
Precision: INR costs are calculated from USD base prices
- Limitations:
Pricing accuracy: Based on publicly available pricing, may not reflect current rates or enterprise discounts
Model coverage: Only includes models with known pricing
Rate changes: Pricing may change without notice
Approximation: For non-OpenAI models, token counts are approximated
Note
This function provides cost estimates for planning and budgeting purposes. Actual costs may vary based on current pricing, volume discounts, and exact tokenization. Always verify current pricing with the model provider for production applications.
See also
count_tokens(): For getting token counts to use with this functionTokenCounter: For more complex token counting scenariosget_supported_models(): For checking which models are available
TokenCounter Class
- class toksum.TokenCounter(model: str)[source]
Bases:
objectA comprehensive token counter for various Large Language Model (LLM) providers.
This class provides functionality to count tokens for 200+ different LLMs from 25+ providers, including OpenAI, Anthropic, Google, Meta, Mistral, and many others. It supports both individual text strings and lists of messages (for chat-like interactions).
The token counting is precise for OpenAI models using the official tiktoken library, and provides reasonable approximations for other providers using intelligent algorithms calibrated for each provider’s tokenization characteristics.
- tokenizer
The tokenizer instance (tiktoken for OpenAI, None for others)
- Type:
Optional[Any]
- Supported Providers:
OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models, embeddings (25+ models)
Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2, Instant (12+ models)
Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)
Meta: LLaMA 2/3/3.1/3.2/3.3 in various sizes (15+ models)
Mistral: Mistral 7B, Mixtral, Mistral Large variants (10+ models)
Cohere: Command, Command-R, Command-R+ (8+ models)
xAI: Grok 1/1.5/2 and beta models (4+ models)
Alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)
Baidu: ERNIE 3.0/3.5/4.0 and variants (8+ models)
Huawei: PanGu Alpha and Coder models (5+ models)
Yandex: YaLM and YaGPT models (4+ models)
DeepSeek: Coder, VL, and LLM models (8+ models)
Tsinghua: ChatGLM and GLM models (5+ models)
And 15+ more providers with specialized models
Examples
Basic usage:
# Count tokens for a single text string counter = TokenCounter("gpt-4") token_count = counter.count("This is a test string.") print(f"Token count: {token_count}")
Chat message format:
# Count tokens for a list of messages (chat format) messages = [ {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "How can I help you?"}, ] token_count = counter.count_messages(messages) print(f"Token count (messages): {token_count}")
Different providers:
# Compare token counts across providers models = ["gpt-4", "claude-3-opus", "gemini-pro", "llama-3-70b"] text = "Compare tokenization across different models." for model in models: counter = TokenCounter(model) tokens = counter.count(text) print(f"{model}: {tokens} tokens")
Cost estimation:
from toksum.core import estimate_cost counter = TokenCounter("gpt-4") tokens = counter.count("Your text here") cost = estimate_cost(tokens, "gpt-4", input_tokens=True) print(f"Estimated cost: ${cost:.4f}")
- Tokenization Accuracy:
OpenAI models: Exact token counts using official tiktoken encodings
Other providers: Approximations with typical accuracy of ±10-20%
Approximation factors: Calibrated per provider based on tokenization patterns
Language optimization: Adjusted for Chinese, Russian, and other languages
Note
For production applications requiring exact token counts, use OpenAI models. For other providers, approximations are suitable for cost estimation, rate limit planning, and comparative analysis.
- Raises:
UnsupportedModelError – If the specified model is not supported
TokenizationError – If tokenization fails or required dependencies are missing
- __init__(model: str)[source]
Initialize the TokenCounter with a specific model.
Sets up the appropriate tokenizer based on the model’s provider. For OpenAI models, initializes the tiktoken tokenizer with the correct encoding. For other providers, sets up approximation-based token counting.
- Parameters:
model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’, ‘gemini-pro’). Model names are case-insensitive and will be converted to lowercase.
- Raises:
UnsupportedModelError – If the model is not supported. The exception includes a list of all supported models for reference.
TokenizationError – If required dependencies are missing (e.g., tiktoken for OpenAI models) or if tokenizer initialization fails.
Examples
# OpenAI model (requires tiktoken) counter = TokenCounter("gpt-4") # Anthropic model (uses approximation) counter = TokenCounter("claude-3-opus-20240229") # Case-insensitive model names counter = TokenCounter("GPT-4") # Same as "gpt-4" # Google model counter = TokenCounter("gemini-pro") # Meta model counter = TokenCounter("llama-3-70b")
Note
The constructor automatically detects the provider based on the model name and sets up the appropriate tokenization method. OpenAI models use precise tiktoken-based counting, while other providers use calibrated approximations.
- count(text: str) int[source]
Count tokens in the given text.
Performs token counting using the appropriate method for the model’s provider. For OpenAI models, uses precise tiktoken-based counting. For other providers, uses intelligent approximation algorithms calibrated for each provider.
- Parameters:
text (str) – The text to count tokens for. Must be a string.
- Returns:
The number of tokens in the text. Returns 0 for empty strings.
- Return type:
- Raises:
TokenizationError – If tokenization fails, input is invalid, or required dependencies are missing. Includes detailed error context with model name and text preview.
- Input Validation:
The method performs comprehensive input validation:
None check: Rejects None input with clear error message
Type check: Ensures input is a string, not int/float/list/dict/etc.
Empty string: Returns 0 for empty strings (valid case)
- Tokenization Methods:
OpenAI models: Uses tiktoken.encode() for exact token counts
Other providers: Uses _approximate_tokens() with provider-specific calibration
- Provider-Specific Accuracy:
OpenAI: 100% accurate (official tokenizer)
Anthropic: ~90-95% accurate (well-calibrated approximation)
Google: ~85-90% accurate (Gemini-optimized approximation)
Meta: ~85-90% accurate (LLaMA-optimized approximation)
Chinese models: ~80-90% accurate (character-optimized for Chinese)
Code models: ~85-95% accurate (code-pattern optimized)
Other providers: ~80-90% accurate (general approximation)
Examples
Basic usage:
counter = TokenCounter("gpt-4") # Simple text tokens = counter.count("Hello, world!") print(f"Tokens: {tokens}") # Exact count for OpenAI # Empty string tokens = counter.count("") print(f"Tokens: {tokens}") # Always returns 0 # Longer text text = "This is a longer text that will be tokenized." tokens = counter.count(text) print(f"Tokens: {tokens}")
Comparing providers:
text = "Compare tokenization across different models." models = ["gpt-4", "claude-3-opus", "gemini-pro"] for model in models: counter = TokenCounter(model) tokens = counter.count(text) print(f"{model}: {tokens} tokens")
Error handling:
try: counter = TokenCounter("gpt-4") tokens = counter.count("Valid text") except TokenizationError as e: print(f"Tokenization failed: {e}")
- Performance:
OpenAI models: Fast (native tiktoken performance)
Other providers: Very fast (lightweight approximation algorithms)
Typical speed: 10,000+ texts per second for approximation methods
Note
For production applications requiring exact token counts, use OpenAI models. For cost estimation, rate limiting, and comparative analysis, approximations provide sufficient accuracy with much better performance.
- count_messages(messages: List[Dict[str, str]]) int[source]
Count tokens for a list of messages in chat format.
Processes a list of message dictionaries (typical chat/conversation format) and returns the total token count including any formatting overhead. This method is essential for chat-based applications and conversation analysis.
- Parameters:
messages (List[Dict[str, str]]) –
List of message dictionaries. Each message must contain ‘role’ and ‘content’ keys.
Expected format:
[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there!"} ]
- Returns:
Total token count for all messages including formatting overhead.
- Return type:
- Raises:
TokenizationError – If messages format is invalid, contains non-string content, or if tokenization of individual messages fails. Includes detailed error context with message index and content preview.
- Message Format Validation:
The method performs comprehensive validation:
Input type: Must be a list, not string/dict/int/etc.
Message structure: Each message must be a dictionary
Required keys: Each message must have ‘role’ and ‘content’ keys
Content type: Message content must be a string, not None/int/list/etc.
Role type: Message role must be a string if present
- Formatting Overhead:
Different providers handle message formatting differently:
OpenAI: Minimal overhead (~1 token per role)
Anthropic: No additional formatting overhead
Other providers: No additional overhead assumed
- Common Message Roles:
system: System instructions or context
user: User input or questions
assistant: AI assistant responses
function: Function call results (some providers)
Examples
Basic chat conversation:
counter = TokenCounter("gpt-4") messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."} ] total_tokens = counter.count_messages(messages) print(f"Total conversation tokens: {total_tokens}")
Comparing individual vs. message counting:
counter = TokenCounter("gpt-4") # Count individual messages individual_total = 0 for msg in messages: tokens = counter.count(msg["content"]) individual_total += tokens print(f"{msg['role']}: {tokens} tokens") # Count as message format (includes formatting overhead) message_total = counter.count_messages(messages) print(f"Individual sum: {individual_total}") print(f"Message format: {message_total}") print(f"Formatting overhead: {message_total - individual_total}")
Error handling:
try: counter = TokenCounter("gpt-4") # Invalid format - missing content invalid_messages = [{"role": "user"}] tokens = counter.count_messages(invalid_messages) except TokenizationError as e: print(f"Message format error: {e}")
Multi-provider comparison:
messages = [ {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there! How can I help?"} ] models = ["gpt-4", "claude-3-opus", "gemini-pro"] for model in models: counter = TokenCounter(model) tokens = counter.count_messages(messages) print(f"{model}: {tokens} tokens")
- Performance:
Speed: Processes thousands of message lists per second
Memory: Minimal additional memory overhead
Scalability: Handles conversations with hundreds of messages
- Use Cases:
Chat applications: Calculate conversation costs
API rate limiting: Plan request sizes for chat endpoints
Conversation analysis: Analyze dialogue token patterns
Cost estimation: Budget for chat-based AI applications
Content moderation: Assess conversation length and complexity
Note
This method is specifically designed for chat/conversation formats. For simple text token counting, use the count() method instead.
Command Line Interface
toksum provides a command-line interface for quick token counting:
# Basic usage
toksum "Hello, world!" gpt-4
# From file
toksum --file document.txt claude-3-opus
# With cost estimation
toksum --cost "Your text" gpt-4
# List supported models
toksum --list-models