Examples

This section provides comprehensive examples of using toksum for various scenarios.

Basic Usage Examples

The examples/basic_usage.py file demonstrates core toksum functionality:

Basic Usage Examples

"""
Basic usage examples for the toksum library.

This module demonstrates comprehensive usage patterns for the toksum library,
showcasing various features and capabilities across different providers and
use cases. The examples progress from simple token counting to advanced
scenarios like cost estimation and multi-provider comparisons.

The examples cover:
    - Quick token counting with the convenience function
    - TokenCounter class usage for multiple operations
    - Chat message format token counting
    - Cost estimation with different currencies
    - Listing and exploring supported models
    - Comparing tokenization across different text types
    - Performance considerations and best practices

Run this file directly to see all examples in action:
    
.. code-block:: bash

    python examples/basic_usage.py

The examples are designed to be educational and can be adapted for
real-world applications involving token counting, cost estimation,
and LLM usage planning.
"""

from toksum import TokenCounter, count_tokens, get_supported_models, estimate_cost

def main():
    """
    Demonstrate comprehensive toksum library usage with practical examples.
    
    This function showcases the main features of toksum through six detailed
    examples that progress from basic usage to advanced scenarios. Each example
    includes explanatory output and demonstrates best practices.

    Examples Covered:
        1. **Quick token counting**: Using the count_tokens convenience function
        2. **TokenCounter class**: Efficient multiple operations with same model
        3. **Chat message counting**: Token counting for conversation formats
        4. **Cost estimation**: Calculating costs with different models and currencies
        5. **Model exploration**: Discovering and listing supported models
        6. **Text type comparison**: Analyzing tokenization across different content types

    The examples demonstrate:
        - Basic API usage patterns
        - Performance considerations
        - Error handling approaches
        - Multi-provider comparisons
        - Cost analysis workflows
        - Content type optimization

    Output Format:
        Each example section includes:
        - Clear section headers with example numbers
        - Descriptive text explaining what's being demonstrated
        - Code execution with formatted output
        - Comparative analysis where relevant
        - Performance and usage insights

    Note:
        This function is designed to be run interactively to see toksum
        capabilities. The examples use real models and will show actual
        token counts and cost estimates based on current pricing.
    """
    print("=== toksum Library Examples ===\n")
    
    # Example 1: Quick token counting
    print("1. Quick token counting:")
    text = "Hello, world! This is a sample text for token counting."
    
    gpt4_tokens = count_tokens(text, "gpt-4")
    claude_tokens = count_tokens(text, "claude-3-opus-20240229")
    
    print(f"Text: '{text}'")
    print(f"GPT-4 tokens: {gpt4_tokens}")
    print(f"Claude-3 Opus tokens: {claude_tokens}")
    print()
    
    # Example 2: Using TokenCounter class
    print("2. Using TokenCounter class:")
    counter = TokenCounter("gpt-3.5-turbo")
    
    texts = [
        "Short text",
        "This is a medium-length text with some more words.",
        "This is a much longer text that contains multiple sentences. It should demonstrate how token counts scale with text length. The tokenizer will break this down into individual tokens based on the model's vocabulary."
    ]
    
    for i, text in enumerate(texts, 1):
        tokens = counter.count(text)
        print(f"Text {i} ({len(text)} chars): {tokens} tokens")
    print()
    
    # Example 3: Counting tokens in chat messages
    print("3. Chat message token counting:")
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is Paris."},
        {"role": "user", "content": "Tell me more about it."}
    ]
    
    gpt_counter = TokenCounter("gpt-4")
    claude_counter = TokenCounter("claude-3-sonnet-20240229")
    
    gpt_total = gpt_counter.count_messages(messages)
    claude_total = claude_counter.count_messages(messages)
    
    print("Chat conversation:")
    for msg in messages:
        print(f"  {msg['role']}: {msg['content']}")
    
    print(f"\nTotal tokens (GPT-4): {gpt_total}")
    print(f"Total tokens (Claude-3 Sonnet): {claude_total}")
    print()
    
    # Example 4: Cost estimation
    print("4. Cost estimation:")
    sample_text = "This is a sample text for cost estimation. " * 100  # Repeat to get more tokens
    
    models_to_test = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus-20240229", "claude-3-haiku-20240307"]
    
    print(f"Sample text length: {len(sample_text)} characters")
    print("\nToken counts and estimated costs:")
    
    for model in models_to_test:
        try:
            tokens = count_tokens(sample_text, model)
            input_cost = estimate_cost(tokens, model, input_tokens=True)
            output_cost = estimate_cost(tokens, model, input_tokens=False)
            
            print(f"{model}:")
            print(f"  Tokens: {tokens}")
            print(f"  Input cost: ${input_cost:.4f}")
            print(f"  Output cost: ${output_cost:.4f}")
        except Exception as e:
            print(f"{model}: Error - {e}")
    print()
    
    # Example 5: List supported models
    print("5. Supported models:")
    models = get_supported_models()
    
    for provider, model_list in models.items():
        print(f"{provider.upper()} models:")
        for model in model_list[:5]:  # Show first 5 models
            print(f"  - {model}")
        if len(model_list) > 5:
            print(f"  ... and {len(model_list) - 5} more")
        print()
    
    # Example 6: Comparing different text types
    print("6. Token counting for different text types:")
    
    text_samples = {
        "Simple English": "The quick brown fox jumps over the lazy dog.",
        "Technical": "import numpy as np\narray = np.zeros((10, 10))\nprint(array.shape)",
        "With Numbers": "The year 2024 has 365 days, and the temperature is 23.5°C.",
        "Punctuation Heavy": "Hello!!! How are you??? I'm fine... Really, really fine!!!",
        "Mixed Case": "CamelCaseVariable = SomeFunction(parameterOne, parameterTwo)"
    }
    
    counter = TokenCounter("gpt-4")
    
    for text_type, text in text_samples.items():
        tokens = counter.count(text)
        chars = len(text)
        ratio = chars / tokens if tokens > 0 else 0
        print(f"{text_type}:")
        print(f"  Text: '{text}'")
        print(f"  Tokens: {tokens}, Characters: {chars}, Chars/Token: {ratio:.2f}")
        print()


if __name__ == "__main__":
    main()

Batch Processing

For processing multiple texts efficiently:

Batch Token Counting

"""
Batch Token Counting Example

This script demonstrates efficient batch processing of multiple texts for token counting.
This approach is useful for processing documents, datasets, or collections of text
where you need token counts for multiple items.

Key Features Demonstrated:
    - List comprehension for efficient batch processing
    - Consistent model usage across multiple texts
    - Simple output formatting for batch results

Use Cases:
    - Document analysis and preprocessing
    - Dataset token count analysis
    - Batch cost estimation for multiple texts
    - Content length assessment for collections

Performance Notes:
    - Creates a new TokenCounter for each text (less efficient)
    - For better performance with many texts, create one TokenCounter instance
    - Consider using TokenCounter class directly for large batches

Example Usage:
    python batch_token_counting.py

Improved Batch Processing:
    For better performance with large batches:
    
    counter = toksum.TokenCounter("gpt-3.5-turbo")
    text_counts = [counter.count(text) for text in texts]
"""

# Batch Token Counting
# Count tokens for multiple texts at once — useful for documents, datasets, etc.
import toksum

texts = ["Hello", "This is a test","count the words"]

text_counts = [toksum.count_tokens(text, model="gpt-3.5-turbo") for text in texts]
print("Batch Token Counting",text_counts)  

Advanced Usage Patterns

Model Comparison

Compare token counts across different providers:

from toksum import TokenCounter

text = "Compare tokenization across different models."
models = ["gpt-4", "claude-3-opus", "gemini-pro", "llama-3-70b"]

print(f"Text: '{text}'")
print("Token counts by model:")

for model in models:
    counter = TokenCounter(model)
    tokens = counter.count(text)
    print(f"  {model}: {tokens} tokens")

Cost Analysis

Analyze costs across different models and scenarios:

from toksum import count_tokens, estimate_cost

# Sample conversation
conversation = """
User: What is machine learning?
Assistant: Machine learning is a subset of artificial intelligence...
User: Can you give me some examples?
Assistant: Sure! Here are some common examples of machine learning...
"""

models = ["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-opus", "claude-3-haiku"]

print("Cost comparison for conversation:")
print(f"Text length: {len(conversation)} characters")
print()

for model in models:
    try:
        tokens = count_tokens(conversation, model)
        input_cost = estimate_cost(tokens, model, input_tokens=True)
        output_cost = estimate_cost(tokens, model, input_tokens=False)

        print(f"{model}:")
        print(f"  Tokens: {tokens}")
        print(f"  Input cost: ${input_cost:.4f}")
        print(f"  Output cost: ${output_cost:.4f}")
        print()
    except Exception as e:
        print(f"{model}: Error - {e}")

Chat Message Processing

Process chat conversations with proper message formatting:

from toksum import TokenCounter

def analyze_conversation(messages, model="gpt-4"):
    counter = TokenCounter(model)

    # Count individual messages
    individual_tokens = []
    for msg in messages:
        tokens = counter.count(msg["content"])
        individual_tokens.append(tokens)
        print(f"{msg['role']}: {tokens} tokens - '{msg['content'][:50]}...'")

    # Count as conversation format
    total_tokens = counter.count_messages(messages)
    individual_sum = sum(individual_tokens)

    print(f"\\nSummary:")
    print(f"Individual message sum: {individual_sum} tokens")
    print(f"Conversation format: {total_tokens} tokens")
    print(f"Formatting overhead: {total_tokens - individual_sum} tokens")

    return total_tokens

# Example conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "Tell me more about it."},
    {"role": "assistant", "content": "Paris is known for its art, fashion, gastronomy, and culture..."}
]

analyze_conversation(messages)

Error Handling Patterns

Robust error handling for production applications:

from toksum import TokenCounter, get_supported_models
from toksum.exceptions import UnsupportedModelError, TokenizationError

def safe_token_count(text, model, fallback_models=None):
    """
    Safely count tokens with fallback options.
    """
    fallback_models = fallback_models or ["gpt-3.5-turbo", "claude-3-haiku"]

    # Try primary model
    try:
        counter = TokenCounter(model)
        return counter.count(text), model
    except UnsupportedModelError:
        print(f"Model '{model}' not supported, trying fallbacks...")
    except TokenizationError as e:
        print(f"Tokenization failed for '{model}': {e}")

    # Try fallback models
    for fallback in fallback_models:
        try:
            counter = TokenCounter(fallback)
            tokens = counter.count(text)
            print(f"Using fallback model: {fallback}")
            return tokens, fallback
        except Exception as e:
            print(f"Fallback '{fallback}' also failed: {e}")
            continue

    # All models failed
    raise RuntimeError("All models failed for token counting")

# Usage
text = "This is a test string for token counting."
try:
    tokens, used_model = safe_token_count(text, "unknown-model")
    print(f"Successfully counted {tokens} tokens using {used_model}")
except RuntimeError as e:
    print(f"Failed to count tokens: {e}")

Performance Optimization

Optimize performance for large-scale processing:

from toksum import TokenCounter
import time

def benchmark_approaches(texts, model="gpt-4"):
    """
    Compare different approaches for batch processing.
    """
    print(f"Benchmarking with {len(texts)} texts using {model}")

    # Approach 1: Create new counter each time (inefficient)
    start_time = time.time()
    results1 = []
    for text in texts:
        counter = TokenCounter(model)
        results1.append(counter.count(text))
    time1 = time.time() - start_time

    # Approach 2: Reuse counter (efficient)
    start_time = time.time()
    counter = TokenCounter(model)
    results2 = [counter.count(text) for text in texts]
    time2 = time.time() - start_time

    print(f"Approach 1 (new counter each time): {time1:.4f}s")
    print(f"Approach 2 (reuse counter): {time2:.4f}s")
    print(f"Speedup: {time1/time2:.2f}x")

    # Verify results are identical
    assert results1 == results2, "Results should be identical"
    return results2

# Test with sample texts
sample_texts = [
    "Short text",
    "Medium length text with more content to tokenize",
    "Much longer text that contains multiple sentences and should demonstrate the performance difference between approaches when processing many texts in batch operations."
] * 100  # Repeat to make timing differences visible

benchmark_approaches(sample_texts)

Integration Examples

Web Application Integration

Example Flask application with toksum integration:

from flask import Flask, request, jsonify
from toksum import TokenCounter, get_supported_models
from toksum.exceptions import ToksumError

app = Flask(__name__)

@app.route('/count', methods=['POST'])
def count_tokens():
    try:
        data = request.json
        text = data.get('text', '')
        model = data.get('model', 'gpt-3.5-turbo')

        counter = TokenCounter(model)
        tokens = counter.count(text)

        return jsonify({
            'tokens': tokens,
            'model': model,
            'text_length': len(text)
        })
    except ToksumError as e:
        return jsonify({'error': str(e)}), 400
    except Exception as e:
        return jsonify({'error': f'Unexpected error: {e}'}), 500

@app.route('/models', methods=['GET'])
def list_models():
    try:
        models = get_supported_models()
        return jsonify(models)
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

Data Processing Pipeline

Example data processing pipeline with toksum:

import pandas as pd
from toksum import TokenCounter
from toksum.exceptions import TokenizationError

def process_dataset(df, text_column, model="gpt-3.5-turbo"):
    """
    Add token counts to a pandas DataFrame.
    """
    counter = TokenCounter(model)

    def safe_count(text):
        try:
            if pd.isna(text) or text == '':
                return 0
            return counter.count(str(text))
        except TokenizationError:
            return -1  # Mark as error

    # Add token count column
    df[f'{text_column}_tokens'] = df[text_column].apply(safe_count)

    # Add statistics
    valid_counts = df[df[f'{text_column}_tokens'] >= 0]
    stats = {
        'total_rows': len(df),
        'valid_counts': len(valid_counts),
        'errors': len(df) - len(valid_counts),
        'avg_tokens': valid_counts[f'{text_column}_tokens'].mean(),
        'max_tokens': valid_counts[f'{text_column}_tokens'].max(),
        'total_tokens': valid_counts[f'{text_column}_tokens'].sum()
    }

    return df, stats

# Example usage
# df = pd.read_csv('your_dataset.csv')
# processed_df, statistics = process_dataset(df, 'content_column')
# print(f"Processing statistics: {statistics}")

These examples demonstrate various ways to integrate toksum into different types of applications and workflows.