Core Module

The core module contains the main functionality for token counting across different LLM providers.

Core functionality for token counting across different LLM providers.

This module contains the main TokenCounter class and supporting functions that provide token counting capabilities for 200+ Large Language Models from 25+ providers.

The module implements:
  • Precise tokenization for OpenAI models using the tiktoken library

  • Intelligent approximation algorithms for all other providers

  • Provider detection with case-insensitive model name matching

  • Message format support for chat-based interactions

  • Comprehensive error handling with detailed error messages

  • Cost estimation for supported models with USD/INR currency support

Key Components:
Provider Support:

The module supports models from major providers including:

  • OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models, embeddings

  • Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2, Instant

  • Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM models

  • Meta: LLaMA 2/3/3.1/3.2/3.3 in various sizes

  • Mistral: Mistral 7B, Mixtral, Mistral Large variants

  • Cohere: Command, Command-R, Command-R+ models

  • xAI: Grok 1/1.5/2 and beta models

  • Chinese providers: Alibaba Qwen, Baidu ERNIE, Huawei PanGu, Tsinghua ChatGLM

  • Code-specialized: DeepSeek Coder, Replit Code, BigCode StarCoder

  • Open source: EleutherAI, Stability AI, TII Falcon, RWKV

  • Enterprise: Databricks DBRX, Microsoft Phi, Amazon Titan, IBM Granite

Tokenization Approach:
  • OpenAI models: Uses official tiktoken encodings (cl100k_base, p50k_base, r50k_base)

  • Other providers: Intelligent approximation based on:
    • Character count analysis

    • Whitespace and punctuation detection

    • Provider-specific adjustment factors

    • Language-optimized calculations (Chinese, Russian, etc.)

The approximation algorithms are calibrated to provide reasonable accuracy for:
  • Cost estimation and budgeting

  • Rate limit planning

  • Content length assessment

  • Comparative analysis across providers

Note

For production applications requiring exact token counts, use OpenAI models with tiktoken. For other providers, the approximations are suitable for planning and estimation purposes.

class toksum.core.TokenCounter(model: str)[source]

Bases: object

A comprehensive token counter for various Large Language Model (LLM) providers.

This class provides functionality to count tokens for 200+ different LLMs from 25+ providers, including OpenAI, Anthropic, Google, Meta, Mistral, and many others. It supports both individual text strings and lists of messages (for chat-like interactions).

The token counting is precise for OpenAI models using the official tiktoken library, and provides reasonable approximations for other providers using intelligent algorithms calibrated for each provider’s tokenization characteristics.

model

The model name (converted to lowercase)

Type:

str

provider

The detected provider name

Type:

str

tokenizer

The tokenizer instance (tiktoken for OpenAI, None for others)

Type:

Optional[Any]

Supported Providers:
  • OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models, embeddings (25+ models)

  • Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2, Instant (12+ models)

  • Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)

  • Meta: LLaMA 2/3/3.1/3.2/3.3 in various sizes (15+ models)

  • Mistral: Mistral 7B, Mixtral, Mistral Large variants (10+ models)

  • Cohere: Command, Command-R, Command-R+ (8+ models)

  • xAI: Grok 1/1.5/2 and beta models (4+ models)

  • Alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)

  • Baidu: ERNIE 3.0/3.5/4.0 and variants (8+ models)

  • Huawei: PanGu Alpha and Coder models (5+ models)

  • Yandex: YaLM and YaGPT models (4+ models)

  • DeepSeek: Coder, VL, and LLM models (8+ models)

  • Tsinghua: ChatGLM and GLM models (5+ models)

  • And 15+ more providers with specialized models

Examples

Basic usage:

# Count tokens for a single text string
counter = TokenCounter("gpt-4")
token_count = counter.count("This is a test string.")
print(f"Token count: {token_count}")

Chat message format:

# Count tokens for a list of messages (chat format)
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "How can I help you?"},
]
token_count = counter.count_messages(messages)
print(f"Token count (messages): {token_count}")

Different providers:

# Compare token counts across providers
models = ["gpt-4", "claude-3-opus", "gemini-pro", "llama-3-70b"]
text = "Compare tokenization across different models."

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Cost estimation:

from toksum.core import estimate_cost

counter = TokenCounter("gpt-4")
tokens = counter.count("Your text here")
cost = estimate_cost(tokens, "gpt-4", input_tokens=True)
print(f"Estimated cost: ${cost:.4f}")
Tokenization Accuracy:
  • OpenAI models: Exact token counts using official tiktoken encodings

  • Other providers: Approximations with typical accuracy of ±10-20%

  • Approximation factors: Calibrated per provider based on tokenization patterns

  • Language optimization: Adjusted for Chinese, Russian, and other languages

Note

For production applications requiring exact token counts, use OpenAI models. For other providers, approximations are suitable for cost estimation, rate limit planning, and comparative analysis.

Raises:
__init__(model: str)[source]

Initialize the TokenCounter with a specific model.

Sets up the appropriate tokenizer based on the model’s provider. For OpenAI models, initializes the tiktoken tokenizer with the correct encoding. For other providers, sets up approximation-based token counting.

Parameters:

model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’, ‘gemini-pro’). Model names are case-insensitive and will be converted to lowercase.

Raises:
  • UnsupportedModelError – If the model is not supported. The exception includes a list of all supported models for reference.

  • TokenizationError – If required dependencies are missing (e.g., tiktoken for OpenAI models) or if tokenizer initialization fails.

Examples

# OpenAI model (requires tiktoken)
counter = TokenCounter("gpt-4")

# Anthropic model (uses approximation)
counter = TokenCounter("claude-3-opus-20240229")

# Case-insensitive model names
counter = TokenCounter("GPT-4")  # Same as "gpt-4"

# Google model
counter = TokenCounter("gemini-pro")

# Meta model
counter = TokenCounter("llama-3-70b")

Note

The constructor automatically detects the provider based on the model name and sets up the appropriate tokenization method. OpenAI models use precise tiktoken-based counting, while other providers use calibrated approximations.

count(text: str) int[source]

Count tokens in the given text.

Performs token counting using the appropriate method for the model’s provider. For OpenAI models, uses precise tiktoken-based counting. For other providers, uses intelligent approximation algorithms calibrated for each provider.

Parameters:

text (str) – The text to count tokens for. Must be a string.

Returns:

The number of tokens in the text. Returns 0 for empty strings.

Return type:

int

Raises:

TokenizationError – If tokenization fails, input is invalid, or required dependencies are missing. Includes detailed error context with model name and text preview.

Input Validation:

The method performs comprehensive input validation:

  • None check: Rejects None input with clear error message

  • Type check: Ensures input is a string, not int/float/list/dict/etc.

  • Empty string: Returns 0 for empty strings (valid case)

Tokenization Methods:
  • OpenAI models: Uses tiktoken.encode() for exact token counts

  • Other providers: Uses _approximate_tokens() with provider-specific calibration

Provider-Specific Accuracy:
  • OpenAI: 100% accurate (official tokenizer)

  • Anthropic: ~90-95% accurate (well-calibrated approximation)

  • Google: ~85-90% accurate (Gemini-optimized approximation)

  • Meta: ~85-90% accurate (LLaMA-optimized approximation)

  • Chinese models: ~80-90% accurate (character-optimized for Chinese)

  • Code models: ~85-95% accurate (code-pattern optimized)

  • Other providers: ~80-90% accurate (general approximation)

Examples

Basic usage:

counter = TokenCounter("gpt-4")

# Simple text
tokens = counter.count("Hello, world!")
print(f"Tokens: {tokens}")  # Exact count for OpenAI

# Empty string
tokens = counter.count("")
print(f"Tokens: {tokens}")  # Always returns 0

# Longer text
text = "This is a longer text that will be tokenized."
tokens = counter.count(text)
print(f"Tokens: {tokens}")

Comparing providers:

text = "Compare tokenization across different models."
models = ["gpt-4", "claude-3-opus", "gemini-pro"]

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    counter = TokenCounter("gpt-4")
    tokens = counter.count("Valid text")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")
Performance:
  • OpenAI models: Fast (native tiktoken performance)

  • Other providers: Very fast (lightweight approximation algorithms)

  • Typical speed: 10,000+ texts per second for approximation methods

Note

For production applications requiring exact token counts, use OpenAI models. For cost estimation, rate limiting, and comparative analysis, approximations provide sufficient accuracy with much better performance.

count_messages(messages: List[Dict[str, str]]) int[source]

Count tokens for a list of messages in chat format.

Processes a list of message dictionaries (typical chat/conversation format) and returns the total token count including any formatting overhead. This method is essential for chat-based applications and conversation analysis.

Parameters:

messages (List[Dict[str, str]]) –

List of message dictionaries. Each message must contain ‘role’ and ‘content’ keys.

Expected format:

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

Returns:

Total token count for all messages including formatting overhead.

Return type:

int

Raises:

TokenizationError – If messages format is invalid, contains non-string content, or if tokenization of individual messages fails. Includes detailed error context with message index and content preview.

Message Format Validation:

The method performs comprehensive validation:

  • Input type: Must be a list, not string/dict/int/etc.

  • Message structure: Each message must be a dictionary

  • Required keys: Each message must have ‘role’ and ‘content’ keys

  • Content type: Message content must be a string, not None/int/list/etc.

  • Role type: Message role must be a string if present

Formatting Overhead:

Different providers handle message formatting differently:

  • OpenAI: Minimal overhead (~1 token per role)

  • Anthropic: No additional formatting overhead

  • Other providers: No additional overhead assumed

Common Message Roles:
  • system: System instructions or context

  • user: User input or questions

  • assistant: AI assistant responses

  • function: Function call results (some providers)

Examples

Basic chat conversation:

counter = TokenCounter("gpt-4")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
]

total_tokens = counter.count_messages(messages)
print(f"Total conversation tokens: {total_tokens}")

Comparing individual vs. message counting:

counter = TokenCounter("gpt-4")

# Count individual messages
individual_total = 0
for msg in messages:
    tokens = counter.count(msg["content"])
    individual_total += tokens
    print(f"{msg['role']}: {tokens} tokens")

# Count as message format (includes formatting overhead)
message_total = counter.count_messages(messages)

print(f"Individual sum: {individual_total}")
print(f"Message format: {message_total}")
print(f"Formatting overhead: {message_total - individual_total}")

Error handling:

try:
    counter = TokenCounter("gpt-4")

    # Invalid format - missing content
    invalid_messages = [{"role": "user"}]
    tokens = counter.count_messages(invalid_messages)

except TokenizationError as e:
    print(f"Message format error: {e}")

Multi-provider comparison:

messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
]

models = ["gpt-4", "claude-3-opus", "gemini-pro"]
for model in models:
    counter = TokenCounter(model)
    tokens = counter.count_messages(messages)
    print(f"{model}: {tokens} tokens")
Performance:
  • Speed: Processes thousands of message lists per second

  • Memory: Minimal additional memory overhead

  • Scalability: Handles conversations with hundreds of messages

Use Cases:
  • Chat applications: Calculate conversation costs

  • API rate limiting: Plan request sizes for chat endpoints

  • Conversation analysis: Analyze dialogue token patterns

  • Cost estimation: Budget for chat-based AI applications

  • Content moderation: Assess conversation length and complexity

Note

This method is specifically designed for chat/conversation formats. For simple text token counting, use the count() method instead.

toksum.core.count_tokens(text: str, model: str) int[source]

Convenience function to count tokens for a given text and model.

This is a simplified interface that creates a TokenCounter instance and performs token counting in a single function call. Ideal for one-off token counting operations without needing to manage TokenCounter instances.

Parameters:
  • text (str) – The text to count tokens for. Must be a string.

  • model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’). Model names are case-insensitive.

Returns:

The number of tokens in the text.

Return type:

int

Raises:

Examples

Basic usage:

from toksum import count_tokens

# OpenAI model
tokens = count_tokens("Hello, world!", "gpt-4")
print(f"GPT-4 tokens: {tokens}")

# Anthropic model
tokens = count_tokens("Hello, world!", "claude-3-opus")
print(f"Claude tokens: {tokens}")

# Case-insensitive model names
tokens = count_tokens("Hello, world!", "GPT-4")  # Same as "gpt-4"

Comparing models:

text = "This is a sample text for comparison."
models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "gemini-pro"]

for model in models:
    tokens = count_tokens(text, model)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    tokens = count_tokens("Hello!", "unsupported-model")
except UnsupportedModelError as e:
    print(f"Model not supported: {e}")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")
Performance:

This function creates a new TokenCounter instance for each call. For multiple operations with the same model, consider using TokenCounter directly for better performance:

# Less efficient for multiple calls
for text in texts:
    tokens = count_tokens(text, "gpt-4")

# More efficient for multiple calls
counter = TokenCounter("gpt-4")
for text in texts:
    tokens = counter.count(text)

Note

This function is equivalent to:

counter = TokenCounter(model)
return counter.count(text)
toksum.core.get_supported_models() Dict[str, List[str]][source]

Get a comprehensive dictionary of supported models organized by provider.

Returns all 200+ supported models grouped by their respective providers, making it easy to discover available models and understand the scope of toksum’s capabilities.

Returns:

Dictionary with provider names as keys and lists

of model names as values. Providers include:

  • openai: GPT-4, GPT-3.5, GPT-4o, O1, embeddings (25+ models)

  • anthropic: Claude 3/3.5, Claude 2, Instant (12+ models)

  • google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)

  • meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)

  • mistral: Mistral 7B, Mixtral, Large variants (10+ models)

  • cohere: Command, Command-R, Command-R+ (8+ models)

  • xai: Grok 1/1.5/2 and beta models (4+ models)

  • alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)

  • baidu: ERNIE 3.0/3.5/4.0 variants (8+ models)

  • huawei: PanGu Alpha and Coder models (5+ models)

  • yandex: YaLM and YaGPT models (4+ models)

  • deepseek: Coder, VL, and LLM models (8+ models)

  • tsinghua: ChatGLM and GLM models (5+ models)

  • databricks: DBRX and Dolly models (6+ models)

  • voyage: Voyage embedding models (6+ models)

  • And 10+ more providers

Return type:

Dict[str, List[str]]

Examples

Basic usage:

from toksum import get_supported_models

models = get_supported_models()

# List all providers
print("Supported providers:")
for provider in models.keys():
    print(f"  {provider}")

Explore specific providers:

models = get_supported_models()

# OpenAI models
print("OpenAI models:")
for model in models["openai"]:
    print(f"  {model}")

# Anthropic models
print("\nAnthropic models:")
for model in models["anthropic"]:
    print(f"  {model}")

Count models by provider:

models = get_supported_models()

print("Model counts by provider:")
total_models = 0
for provider, model_list in models.items():
    count = len(model_list)
    total_models += count
    print(f"  {provider}: {count} models")

print(f"\nTotal: {total_models} models")

Find models by pattern:

models = get_supported_models()

# Find all GPT-4 variants
gpt4_models = []
for model in models["openai"]:
    if "gpt-4" in model:
        gpt4_models.append(model)

print("GPT-4 variants:")
for model in gpt4_models:
    print(f"  {model}")

Validate model support:

models = get_supported_models()

def is_model_supported(model_name):
    model_lower = model_name.lower()
    for provider_models in models.values():
        if model_lower in [m.lower() for m in provider_models]:
            return True
    return False

# Check if models are supported
test_models = ["gpt-4", "claude-3-opus", "unknown-model"]
for model in test_models:
    supported = is_model_supported(model)
    print(f"{model}: {'✓' if supported else '✗'}")

Integration with TokenCounter:

from toksum import TokenCounter, get_supported_models

models = get_supported_models()
text = "Test tokenization across providers."

# Test a few models from each major provider
test_models = {
    "openai": models["openai"][0],      # First OpenAI model
    "anthropic": models["anthropic"][0], # First Anthropic model
    "google": models["google"][0],       # First Google model
    "meta": models["meta"][0]            # First Meta model
}

for provider, model in test_models.items():
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{provider} ({model}): {tokens} tokens")
Provider Categories:

The returned dictionary includes models from these categories:

Major Cloud Providers: - OpenAI, Anthropic, Google, Microsoft, Amazon

AI-First Companies: - Mistral, Cohere, xAI, Perplexity, AI21

Regional/Language-Specific: - Alibaba (Chinese), Baidu (Chinese), Huawei (Chinese) - Yandex (Russian), Tsinghua (Chinese)

Open Source/Research: - EleutherAI, Stability AI, TII, RWKV, Community models

Enterprise/Specialized: - Databricks, Voyage, DeepSeek, BigCode, Replit - Nvidia, IBM, Salesforce

Note

The model lists are comprehensive but may not include every variant or the very latest models. The library is regularly updated to include new models as they become available.

See also

  • TokenCounter: For creating token counters with specific models

  • count_tokens(): For quick token counting with model validation

  • UnsupportedModelError: Exception raised for unsupported models

toksum.core.estimate_cost(token_count: int, model: str, input_tokens: bool = True, currency: str = 'USD') float[source]

Estimate the cost for a given number of tokens and model.

Calculates estimated costs based on current pricing for supported models. Supports both input and output token pricing, as many models have different rates for input vs. output tokens. Provides costs in USD or INR currency.

Parameters:
  • token_count (int) – Number of tokens to estimate cost for. Must be non-negative.

  • model (str) – Model name (e.g., “gpt-4”, “gpt-4o”, “claude-3-opus-20240229”). Model names are case-insensitive.

  • input_tokens (bool, optional) – True for input token pricing, False for output token pricing. Defaults to True. Many models charge more for output tokens than input tokens.

  • currency (str, optional) – Currency code (“USD” or “INR”). Defaults to “USD”. Uses current conversion rate for INR.

Returns:

Estimated cost in the specified currency. Returns 0.0 if the model

is not in the pricing database or if pricing is not available.

Return type:

float

Pricing Coverage:

The function includes pricing for major models:

OpenAI Models: - GPT-4: $0.03/$0.06 per 1K tokens (input/output) - GPT-4 Turbo: $0.01/$0.03 per 1K tokens - GPT-4o: $0.005/$0.015 per 1K tokens - GPT-4o Mini: $0.00015/$0.0006 per 1K tokens - GPT-3.5 Turbo: $0.001/$0.002 per 1K tokens

Anthropic Models: - Claude-3 Opus: $0.015/$0.075 per 1K tokens - Claude-3 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3 Haiku: $0.00025/$0.00125 per 1K tokens - Claude-3.5 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3.5 Haiku: $0.001/$0.005 per 1K tokens

Databricks Models: - DBRX Instruct: $0.001/$0.002 per 1K tokens - Dolly models: $0.001/$0.002 per 1K tokens

Voyage AI Models: - All Voyage models: $0.0001/$0.0001 per 1K tokens

Examples

Basic cost estimation:

from toksum import count_tokens, estimate_cost

text = "This is a sample text for cost estimation."
model = "gpt-4"

# Count tokens and estimate cost
tokens = count_tokens(text, model)
input_cost = estimate_cost(tokens, model, input_tokens=True)
output_cost = estimate_cost(tokens, model, input_tokens=False)

print(f"Text: '{text}'")
print(f"Tokens: {tokens}")
print(f"Input cost: ${input_cost:.4f}")
print(f"Output cost: ${output_cost:.4f}")

Compare costs across models:

text = "Compare costs across different models." * 100  # Longer text
models = ["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-opus", "claude-3-haiku"]

print(f"Text length: {len(text)} characters")
print("\nCost comparison:")

for model in models:
    try:
        tokens = count_tokens(text, model)
        input_cost = estimate_cost(tokens, model, input_tokens=True)
        output_cost = estimate_cost(tokens, model, input_tokens=False)

        print(f"{model}:")
        print(f"  Tokens: {tokens}")
        print(f"  Input: ${input_cost:.4f}")
        print(f"  Output: ${output_cost:.4f}")
    except Exception as e:
        print(f"{model}: Error - {e}")

Currency conversion:

tokens = 1000
model = "gpt-4"

# USD pricing
cost_usd = estimate_cost(tokens, model, currency="USD")
print(f"Cost in USD: ${cost_usd:.4f}")

# INR pricing
cost_inr = estimate_cost(tokens, model, currency="INR")
print(f"Cost in INR: ₹{cost_inr:.2f}")

Batch cost estimation:

texts = [
    "Short text",
    "Medium length text with more content",
    "Much longer text that will cost more to process" * 10
]

model = "gpt-4o"
total_cost = 0

print("Individual text costs:")
for i, text in enumerate(texts, 1):
    tokens = count_tokens(text, model)
    cost = estimate_cost(tokens, model)
    total_cost += cost
    print(f"Text {i}: {tokens} tokens, ${cost:.4f}")

print(f"\nTotal estimated cost: ${total_cost:.4f}")

Chat conversation costing:

from toksum import TokenCounter

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing."},
    {"role": "assistant", "content": "Quantum computing is a revolutionary..."}
]

counter = TokenCounter("gpt-4")
total_tokens = counter.count_messages(messages)

# Estimate costs for the conversation
input_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=True)
output_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=False)

print(f"Conversation tokens: {total_tokens}")
print(f"If all input: ${input_cost:.4f}")
print(f"If all output: ${output_cost:.4f}")
Currency Conversion:
  • USD to INR rate: 83.0 (as of July 2025)

  • Rate updates: The conversion rate is periodically updated

  • Precision: INR costs are calculated from USD base prices

Limitations:
  • Pricing accuracy: Based on publicly available pricing, may not reflect current rates or enterprise discounts

  • Model coverage: Only includes models with known pricing

  • Rate changes: Pricing may change without notice

  • Approximation: For non-OpenAI models, token counts are approximated

Note

This function provides cost estimates for planning and budgeting purposes. Actual costs may vary based on current pricing, volume discounts, and exact tokenization. Always verify current pricing with the model provider for production applications.

See also

TokenCounter Class

class toksum.core.TokenCounter(model: str)[source]

Bases: object

A comprehensive token counter for various Large Language Model (LLM) providers.

This class provides functionality to count tokens for 200+ different LLMs from 25+ providers, including OpenAI, Anthropic, Google, Meta, Mistral, and many others. It supports both individual text strings and lists of messages (for chat-like interactions).

The token counting is precise for OpenAI models using the official tiktoken library, and provides reasonable approximations for other providers using intelligent algorithms calibrated for each provider’s tokenization characteristics.

model

The model name (converted to lowercase)

Type:

str

provider

The detected provider name

Type:

str

tokenizer

The tokenizer instance (tiktoken for OpenAI, None for others)

Type:

Optional[Any]

Supported Providers:
  • OpenAI: GPT-4, GPT-3.5, GPT-4o, O1 models, embeddings (25+ models)

  • Anthropic: Claude 3/3.5 (Opus, Sonnet, Haiku), Claude 2, Instant (12+ models)

  • Google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)

  • Meta: LLaMA 2/3/3.1/3.2/3.3 in various sizes (15+ models)

  • Mistral: Mistral 7B, Mixtral, Mistral Large variants (10+ models)

  • Cohere: Command, Command-R, Command-R+ (8+ models)

  • xAI: Grok 1/1.5/2 and beta models (4+ models)

  • Alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)

  • Baidu: ERNIE 3.0/3.5/4.0 and variants (8+ models)

  • Huawei: PanGu Alpha and Coder models (5+ models)

  • Yandex: YaLM and YaGPT models (4+ models)

  • DeepSeek: Coder, VL, and LLM models (8+ models)

  • Tsinghua: ChatGLM and GLM models (5+ models)

  • And 15+ more providers with specialized models

Examples

Basic usage:

# Count tokens for a single text string
counter = TokenCounter("gpt-4")
token_count = counter.count("This is a test string.")
print(f"Token count: {token_count}")

Chat message format:

# Count tokens for a list of messages (chat format)
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "How can I help you?"},
]
token_count = counter.count_messages(messages)
print(f"Token count (messages): {token_count}")

Different providers:

# Compare token counts across providers
models = ["gpt-4", "claude-3-opus", "gemini-pro", "llama-3-70b"]
text = "Compare tokenization across different models."

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Cost estimation:

from toksum.core import estimate_cost

counter = TokenCounter("gpt-4")
tokens = counter.count("Your text here")
cost = estimate_cost(tokens, "gpt-4", input_tokens=True)
print(f"Estimated cost: ${cost:.4f}")
Tokenization Accuracy:
  • OpenAI models: Exact token counts using official tiktoken encodings

  • Other providers: Approximations with typical accuracy of ±10-20%

  • Approximation factors: Calibrated per provider based on tokenization patterns

  • Language optimization: Adjusted for Chinese, Russian, and other languages

Note

For production applications requiring exact token counts, use OpenAI models. For other providers, approximations are suitable for cost estimation, rate limit planning, and comparative analysis.

Raises:
__init__(model: str)[source]

Initialize the TokenCounter with a specific model.

Sets up the appropriate tokenizer based on the model’s provider. For OpenAI models, initializes the tiktoken tokenizer with the correct encoding. For other providers, sets up approximation-based token counting.

Parameters:

model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’, ‘gemini-pro’). Model names are case-insensitive and will be converted to lowercase.

Raises:
  • UnsupportedModelError – If the model is not supported. The exception includes a list of all supported models for reference.

  • TokenizationError – If required dependencies are missing (e.g., tiktoken for OpenAI models) or if tokenizer initialization fails.

Examples

# OpenAI model (requires tiktoken)
counter = TokenCounter("gpt-4")

# Anthropic model (uses approximation)
counter = TokenCounter("claude-3-opus-20240229")

# Case-insensitive model names
counter = TokenCounter("GPT-4")  # Same as "gpt-4"

# Google model
counter = TokenCounter("gemini-pro")

# Meta model
counter = TokenCounter("llama-3-70b")

Note

The constructor automatically detects the provider based on the model name and sets up the appropriate tokenization method. OpenAI models use precise tiktoken-based counting, while other providers use calibrated approximations.

count(text: str) int[source]

Count tokens in the given text.

Performs token counting using the appropriate method for the model’s provider. For OpenAI models, uses precise tiktoken-based counting. For other providers, uses intelligent approximation algorithms calibrated for each provider.

Parameters:

text (str) – The text to count tokens for. Must be a string.

Returns:

The number of tokens in the text. Returns 0 for empty strings.

Return type:

int

Raises:

TokenizationError – If tokenization fails, input is invalid, or required dependencies are missing. Includes detailed error context with model name and text preview.

Input Validation:

The method performs comprehensive input validation:

  • None check: Rejects None input with clear error message

  • Type check: Ensures input is a string, not int/float/list/dict/etc.

  • Empty string: Returns 0 for empty strings (valid case)

Tokenization Methods:
  • OpenAI models: Uses tiktoken.encode() for exact token counts

  • Other providers: Uses _approximate_tokens() with provider-specific calibration

Provider-Specific Accuracy:
  • OpenAI: 100% accurate (official tokenizer)

  • Anthropic: ~90-95% accurate (well-calibrated approximation)

  • Google: ~85-90% accurate (Gemini-optimized approximation)

  • Meta: ~85-90% accurate (LLaMA-optimized approximation)

  • Chinese models: ~80-90% accurate (character-optimized for Chinese)

  • Code models: ~85-95% accurate (code-pattern optimized)

  • Other providers: ~80-90% accurate (general approximation)

Examples

Basic usage:

counter = TokenCounter("gpt-4")

# Simple text
tokens = counter.count("Hello, world!")
print(f"Tokens: {tokens}")  # Exact count for OpenAI

# Empty string
tokens = counter.count("")
print(f"Tokens: {tokens}")  # Always returns 0

# Longer text
text = "This is a longer text that will be tokenized."
tokens = counter.count(text)
print(f"Tokens: {tokens}")

Comparing providers:

text = "Compare tokenization across different models."
models = ["gpt-4", "claude-3-opus", "gemini-pro"]

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    counter = TokenCounter("gpt-4")
    tokens = counter.count("Valid text")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")
Performance:
  • OpenAI models: Fast (native tiktoken performance)

  • Other providers: Very fast (lightweight approximation algorithms)

  • Typical speed: 10,000+ texts per second for approximation methods

Note

For production applications requiring exact token counts, use OpenAI models. For cost estimation, rate limiting, and comparative analysis, approximations provide sufficient accuracy with much better performance.

count_messages(messages: List[Dict[str, str]]) int[source]

Count tokens for a list of messages in chat format.

Processes a list of message dictionaries (typical chat/conversation format) and returns the total token count including any formatting overhead. This method is essential for chat-based applications and conversation analysis.

Parameters:

messages (List[Dict[str, str]]) –

List of message dictionaries. Each message must contain ‘role’ and ‘content’ keys.

Expected format:

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

Returns:

Total token count for all messages including formatting overhead.

Return type:

int

Raises:

TokenizationError – If messages format is invalid, contains non-string content, or if tokenization of individual messages fails. Includes detailed error context with message index and content preview.

Message Format Validation:

The method performs comprehensive validation:

  • Input type: Must be a list, not string/dict/int/etc.

  • Message structure: Each message must be a dictionary

  • Required keys: Each message must have ‘role’ and ‘content’ keys

  • Content type: Message content must be a string, not None/int/list/etc.

  • Role type: Message role must be a string if present

Formatting Overhead:

Different providers handle message formatting differently:

  • OpenAI: Minimal overhead (~1 token per role)

  • Anthropic: No additional formatting overhead

  • Other providers: No additional overhead assumed

Common Message Roles:
  • system: System instructions or context

  • user: User input or questions

  • assistant: AI assistant responses

  • function: Function call results (some providers)

Examples

Basic chat conversation:

counter = TokenCounter("gpt-4")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
]

total_tokens = counter.count_messages(messages)
print(f"Total conversation tokens: {total_tokens}")

Comparing individual vs. message counting:

counter = TokenCounter("gpt-4")

# Count individual messages
individual_total = 0
for msg in messages:
    tokens = counter.count(msg["content"])
    individual_total += tokens
    print(f"{msg['role']}: {tokens} tokens")

# Count as message format (includes formatting overhead)
message_total = counter.count_messages(messages)

print(f"Individual sum: {individual_total}")
print(f"Message format: {message_total}")
print(f"Formatting overhead: {message_total - individual_total}")

Error handling:

try:
    counter = TokenCounter("gpt-4")

    # Invalid format - missing content
    invalid_messages = [{"role": "user"}]
    tokens = counter.count_messages(invalid_messages)

except TokenizationError as e:
    print(f"Message format error: {e}")

Multi-provider comparison:

messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
]

models = ["gpt-4", "claude-3-opus", "gemini-pro"]
for model in models:
    counter = TokenCounter(model)
    tokens = counter.count_messages(messages)
    print(f"{model}: {tokens} tokens")
Performance:
  • Speed: Processes thousands of message lists per second

  • Memory: Minimal additional memory overhead

  • Scalability: Handles conversations with hundreds of messages

Use Cases:
  • Chat applications: Calculate conversation costs

  • API rate limiting: Plan request sizes for chat endpoints

  • Conversation analysis: Analyze dialogue token patterns

  • Cost estimation: Budget for chat-based AI applications

  • Content moderation: Assess conversation length and complexity

Note

This method is specifically designed for chat/conversation formats. For simple text token counting, use the count() method instead.

Convenience Functions

toksum.core.count_tokens(text: str, model: str) int[source]

Convenience function to count tokens for a given text and model.

This is a simplified interface that creates a TokenCounter instance and performs token counting in a single function call. Ideal for one-off token counting operations without needing to manage TokenCounter instances.

Parameters:
  • text (str) – The text to count tokens for. Must be a string.

  • model (str) – The model name (e.g., ‘gpt-4’, ‘claude-3-opus-20240229’). Model names are case-insensitive.

Returns:

The number of tokens in the text.

Return type:

int

Raises:

Examples

Basic usage:

from toksum import count_tokens

# OpenAI model
tokens = count_tokens("Hello, world!", "gpt-4")
print(f"GPT-4 tokens: {tokens}")

# Anthropic model
tokens = count_tokens("Hello, world!", "claude-3-opus")
print(f"Claude tokens: {tokens}")

# Case-insensitive model names
tokens = count_tokens("Hello, world!", "GPT-4")  # Same as "gpt-4"

Comparing models:

text = "This is a sample text for comparison."
models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "gemini-pro"]

for model in models:
    tokens = count_tokens(text, model)
    print(f"{model}: {tokens} tokens")

Error handling:

try:
    tokens = count_tokens("Hello!", "unsupported-model")
except UnsupportedModelError as e:
    print(f"Model not supported: {e}")
except TokenizationError as e:
    print(f"Tokenization failed: {e}")
Performance:

This function creates a new TokenCounter instance for each call. For multiple operations with the same model, consider using TokenCounter directly for better performance:

# Less efficient for multiple calls
for text in texts:
    tokens = count_tokens(text, "gpt-4")

# More efficient for multiple calls
counter = TokenCounter("gpt-4")
for text in texts:
    tokens = counter.count(text)

Note

This function is equivalent to:

counter = TokenCounter(model)
return counter.count(text)
toksum.core.get_supported_models() Dict[str, List[str]][source]

Get a comprehensive dictionary of supported models organized by provider.

Returns all 200+ supported models grouped by their respective providers, making it easy to discover available models and understand the scope of toksum’s capabilities.

Returns:

Dictionary with provider names as keys and lists

of model names as values. Providers include:

  • openai: GPT-4, GPT-3.5, GPT-4o, O1, embeddings (25+ models)

  • anthropic: Claude 3/3.5, Claude 2, Instant (12+ models)

  • google: Gemini Pro/Flash, Gemini 1.5/2.0, PaLM (10+ models)

  • meta: LLaMA 2/3/3.1/3.2/3.3 variants (15+ models)

  • mistral: Mistral 7B, Mixtral, Large variants (10+ models)

  • cohere: Command, Command-R, Command-R+ (8+ models)

  • xai: Grok 1/1.5/2 and beta models (4+ models)

  • alibaba: Qwen 1.5/2.0/2.5 and vision models (20+ models)

  • baidu: ERNIE 3.0/3.5/4.0 variants (8+ models)

  • huawei: PanGu Alpha and Coder models (5+ models)

  • yandex: YaLM and YaGPT models (4+ models)

  • deepseek: Coder, VL, and LLM models (8+ models)

  • tsinghua: ChatGLM and GLM models (5+ models)

  • databricks: DBRX and Dolly models (6+ models)

  • voyage: Voyage embedding models (6+ models)

  • And 10+ more providers

Return type:

Dict[str, List[str]]

Examples

Basic usage:

from toksum import get_supported_models

models = get_supported_models()

# List all providers
print("Supported providers:")
for provider in models.keys():
    print(f"  {provider}")

Explore specific providers:

models = get_supported_models()

# OpenAI models
print("OpenAI models:")
for model in models["openai"]:
    print(f"  {model}")

# Anthropic models
print("\nAnthropic models:")
for model in models["anthropic"]:
    print(f"  {model}")

Count models by provider:

models = get_supported_models()

print("Model counts by provider:")
total_models = 0
for provider, model_list in models.items():
    count = len(model_list)
    total_models += count
    print(f"  {provider}: {count} models")

print(f"\nTotal: {total_models} models")

Find models by pattern:

models = get_supported_models()

# Find all GPT-4 variants
gpt4_models = []
for model in models["openai"]:
    if "gpt-4" in model:
        gpt4_models.append(model)

print("GPT-4 variants:")
for model in gpt4_models:
    print(f"  {model}")

Validate model support:

models = get_supported_models()

def is_model_supported(model_name):
    model_lower = model_name.lower()
    for provider_models in models.values():
        if model_lower in [m.lower() for m in provider_models]:
            return True
    return False

# Check if models are supported
test_models = ["gpt-4", "claude-3-opus", "unknown-model"]
for model in test_models:
    supported = is_model_supported(model)
    print(f"{model}: {'✓' if supported else '✗'}")

Integration with TokenCounter:

from toksum import TokenCounter, get_supported_models

models = get_supported_models()
text = "Test tokenization across providers."

# Test a few models from each major provider
test_models = {
    "openai": models["openai"][0],      # First OpenAI model
    "anthropic": models["anthropic"][0], # First Anthropic model
    "google": models["google"][0],       # First Google model
    "meta": models["meta"][0]            # First Meta model
}

for provider, model in test_models.items():
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"{provider} ({model}): {tokens} tokens")
Provider Categories:

The returned dictionary includes models from these categories:

Major Cloud Providers: - OpenAI, Anthropic, Google, Microsoft, Amazon

AI-First Companies: - Mistral, Cohere, xAI, Perplexity, AI21

Regional/Language-Specific: - Alibaba (Chinese), Baidu (Chinese), Huawei (Chinese) - Yandex (Russian), Tsinghua (Chinese)

Open Source/Research: - EleutherAI, Stability AI, TII, RWKV, Community models

Enterprise/Specialized: - Databricks, Voyage, DeepSeek, BigCode, Replit - Nvidia, IBM, Salesforce

Note

The model lists are comprehensive but may not include every variant or the very latest models. The library is regularly updated to include new models as they become available.

See also

  • TokenCounter: For creating token counters with specific models

  • count_tokens(): For quick token counting with model validation

  • UnsupportedModelError: Exception raised for unsupported models

toksum.core.estimate_cost(token_count: int, model: str, input_tokens: bool = True, currency: str = 'USD') float[source]

Estimate the cost for a given number of tokens and model.

Calculates estimated costs based on current pricing for supported models. Supports both input and output token pricing, as many models have different rates for input vs. output tokens. Provides costs in USD or INR currency.

Parameters:
  • token_count (int) – Number of tokens to estimate cost for. Must be non-negative.

  • model (str) – Model name (e.g., “gpt-4”, “gpt-4o”, “claude-3-opus-20240229”). Model names are case-insensitive.

  • input_tokens (bool, optional) – True for input token pricing, False for output token pricing. Defaults to True. Many models charge more for output tokens than input tokens.

  • currency (str, optional) – Currency code (“USD” or “INR”). Defaults to “USD”. Uses current conversion rate for INR.

Returns:

Estimated cost in the specified currency. Returns 0.0 if the model

is not in the pricing database or if pricing is not available.

Return type:

float

Pricing Coverage:

The function includes pricing for major models:

OpenAI Models: - GPT-4: $0.03/$0.06 per 1K tokens (input/output) - GPT-4 Turbo: $0.01/$0.03 per 1K tokens - GPT-4o: $0.005/$0.015 per 1K tokens - GPT-4o Mini: $0.00015/$0.0006 per 1K tokens - GPT-3.5 Turbo: $0.001/$0.002 per 1K tokens

Anthropic Models: - Claude-3 Opus: $0.015/$0.075 per 1K tokens - Claude-3 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3 Haiku: $0.00025/$0.00125 per 1K tokens - Claude-3.5 Sonnet: $0.003/$0.015 per 1K tokens - Claude-3.5 Haiku: $0.001/$0.005 per 1K tokens

Databricks Models: - DBRX Instruct: $0.001/$0.002 per 1K tokens - Dolly models: $0.001/$0.002 per 1K tokens

Voyage AI Models: - All Voyage models: $0.0001/$0.0001 per 1K tokens

Examples

Basic cost estimation:

from toksum import count_tokens, estimate_cost

text = "This is a sample text for cost estimation."
model = "gpt-4"

# Count tokens and estimate cost
tokens = count_tokens(text, model)
input_cost = estimate_cost(tokens, model, input_tokens=True)
output_cost = estimate_cost(tokens, model, input_tokens=False)

print(f"Text: '{text}'")
print(f"Tokens: {tokens}")
print(f"Input cost: ${input_cost:.4f}")
print(f"Output cost: ${output_cost:.4f}")

Compare costs across models:

text = "Compare costs across different models." * 100  # Longer text
models = ["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-opus", "claude-3-haiku"]

print(f"Text length: {len(text)} characters")
print("\nCost comparison:")

for model in models:
    try:
        tokens = count_tokens(text, model)
        input_cost = estimate_cost(tokens, model, input_tokens=True)
        output_cost = estimate_cost(tokens, model, input_tokens=False)

        print(f"{model}:")
        print(f"  Tokens: {tokens}")
        print(f"  Input: ${input_cost:.4f}")
        print(f"  Output: ${output_cost:.4f}")
    except Exception as e:
        print(f"{model}: Error - {e}")

Currency conversion:

tokens = 1000
model = "gpt-4"

# USD pricing
cost_usd = estimate_cost(tokens, model, currency="USD")
print(f"Cost in USD: ${cost_usd:.4f}")

# INR pricing
cost_inr = estimate_cost(tokens, model, currency="INR")
print(f"Cost in INR: ₹{cost_inr:.2f}")

Batch cost estimation:

texts = [
    "Short text",
    "Medium length text with more content",
    "Much longer text that will cost more to process" * 10
]

model = "gpt-4o"
total_cost = 0

print("Individual text costs:")
for i, text in enumerate(texts, 1):
    tokens = count_tokens(text, model)
    cost = estimate_cost(tokens, model)
    total_cost += cost
    print(f"Text {i}: {tokens} tokens, ${cost:.4f}")

print(f"\nTotal estimated cost: ${total_cost:.4f}")

Chat conversation costing:

from toksum import TokenCounter

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing."},
    {"role": "assistant", "content": "Quantum computing is a revolutionary..."}
]

counter = TokenCounter("gpt-4")
total_tokens = counter.count_messages(messages)

# Estimate costs for the conversation
input_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=True)
output_cost = estimate_cost(total_tokens, "gpt-4", input_tokens=False)

print(f"Conversation tokens: {total_tokens}")
print(f"If all input: ${input_cost:.4f}")
print(f"If all output: ${output_cost:.4f}")
Currency Conversion:
  • USD to INR rate: 83.0 (as of July 2025)

  • Rate updates: The conversion rate is periodically updated

  • Precision: INR costs are calculated from USD base prices

Limitations:
  • Pricing accuracy: Based on publicly available pricing, may not reflect current rates or enterprise discounts

  • Model coverage: Only includes models with known pricing

  • Rate changes: Pricing may change without notice

  • Approximation: For non-OpenAI models, token counts are approximated

Note

This function provides cost estimates for planning and budgeting purposes. Actual costs may vary based on current pricing, volume discounts, and exact tokenization. Always verify current pricing with the model provider for production applications.

See also

Model Dictionaries

The core module defines comprehensive model dictionaries for all supported providers:

OpenAI Models

toksum.core.OPENAI_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

toksum.core.OPENAI_LEGACY_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

toksum.core.OPENAI_O1_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

Anthropic Models

toksum.core.ANTHROPIC_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

toksum.core.ANTHROPIC_LEGACY_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

Google Models

toksum.core.GOOGLE_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

Meta Models

toksum.core.META_MODELS = {...}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

Other Provider Models

The module includes model dictionaries for 20+ additional providers including:

  • Mistral (MISTRAL_MODELS)

  • Cohere (COHERE_MODELS)

  • xAI (XAI_MODELS)

  • Alibaba (ALIBABA_MODELS)

  • Baidu (BAIDU_MODELS)

  • Huawei (HUAWEI_MODELS)

  • DeepSeek (DEEPSEEK_MODELS)

  • And many more…

Each dictionary maps model names to their tokenization identifiers for approximation algorithms.