allendpoints/README.md

# AllEndpoints - Universal LLM Inference Tool

AllEndpoints is a powerful Python module for making inferences with various LLM providers through a unified interface. It supports multiple providers including Ollama (local), HuggingFace, Together, Google Gemini, AIQL, Groq, NVIDIA, and GitHub Copilot APIs.

## Table of Contents

- [Installation](#installation)
- [Environment Variables](#environment-variables)
- [Setting Up Environment Variables](#setting-up-environment-variables)
- [Linux/macOS](#linuxmacos)
- [Windows](#windows)
- [Usage](#usage)
- [Command-Line Arguments](#command-line-arguments)
- [Examples](#examples)
- [Using as a Python Module](#using-as-a-python-module)
- [Supported Providers](#supported-providers)
- [Adding New Models](#adding-new-models)
- [Troubleshooting](#troubleshooting)

## Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/yourusername/allendpoints.git
   cd allendpoints
   ```

2. Install the required dependencies:
   ```bash
   pip install ollama requests google-generativeai huggingface_hub together groq openai colorama
   ```

3. Install Ollama (optional, for local inference):
   - [Ollama Installation Guide](https://github.com/ollama/ollama)

## Environment Variables

The script uses environment variables to store API keys for different providers. Here are the required environment variables for each provider:

| Provider    | Environment Variable | Description                                |
|-------------|---------------------|--------------------------------------------|
| HuggingFace | `HF_API_KEY`        | HuggingFace API key                        |
| Together    | `TOGETHER_API_KEY`  | Together AI API key                         |
| Google Gemini | `GEMINI_API_KEY`  | Google AI Studio API key                    |
| AIQL        | `AIQL_API_KEY`      | AIQL API key                               |
| Groq        | `GROQ_API_KEY`      | Groq API key                               |
| NVIDIA      | `NVIDIA_API_KEY`    | NVIDIA API key                             |
| GitHub      | `GITHUB_TOKEN`      | GitHub token for Copilot API access        |

### Setting Up Environment Variables

#### Linux/macOS

**Temporary (Current Session Only)**

```bash
export HF_API_KEY="your_huggingface_api_key"
export TOGETHER_API_KEY="your_together_api_key"
export GEMINI_API_KEY="your_gemini_api_key"
export AIQL_API_KEY="your_aiql_api_key"
export GROQ_API_KEY="your_groq_api_key"
export NVIDIA_API_KEY="your_nvidia_api_key"
export GITHUB_TOKEN="your_github_token"
```

**Permanent (Add to Shell Profile)**

Add the above export commands to your `~/.bashrc`, `~/.zshrc`, or `~/.profile` file:

```bash
echo 'export HF_API_KEY="your_huggingface_api_key"' >> ~/.bashrc
echo 'export TOGETHER_API_KEY="your_together_api_key"' >> ~/.bashrc
# Add other API keys similarly
```

Then reload your shell configuration:
```bash
source ~/.bashrc  # or ~/.zshrc or ~/.profile
```

#### Windows

**Command Prompt (Temporary)**

```cmd
set HF_API_KEY=your_huggingface_api_key
set TOGETHER_API_KEY=your_together_api_key
set GEMINI_API_KEY=your_gemini_api_key
set AIQL_API_KEY=your_aiql_api_key
set GROQ_API_KEY=your_groq_api_key
set NVIDIA_API_KEY=your_nvidia_api_key
set GITHUB_TOKEN=your_github_token
```

**PowerShell (Temporary)**

```powershell
$env:HF_API_KEY = "your_huggingface_api_key"
$env:TOGETHER_API_KEY = "your_together_api_key"
$env:GEMINI_API_KEY = "your_gemini_api_key"
$env:AIQL_API_KEY = "your_aiql_api_key"
$env:GROQ_API_KEY = "your_groq_api_key"
$env:NVIDIA_API_KEY = "your_nvidia_api_key"
$env:GITHUB_TOKEN = "your_github_token"
```

**Permanent (System Environment Variables)**

1. Right-click on "This PC" or "My Computer" and select "Properties"
2. Click on "Advanced system settings"
3. Click on "Environment Variables"
4. Under "User variables" or "System variables", click "New"
5. Enter the variable name (e.g., `HF_API_KEY`) and its value
6. Click "OK" to save

## Usage

### Command-Line Arguments

```
usage: allendpoints.py [-h] [--provider PROVIDER] [--model MODEL] [--system SYSTEM] [--list] [--debug] [-a] [prompt]

LLM Inference Module

positional arguments:
  prompt               The prompt to send to the model (default: "Why is the sky blue?")

options:
  -h, --help           show this help message and exit
  --provider PROVIDER  The provider to use (ollama, hf, together, gemini, aiql, groq, nvidia, github)
  --model MODEL        The specific model to use
  --system SYSTEM      System content for chat models (default: "You are a helpful assistant.")
  --list               List available providers and models
  --debug              Enable debug output
  -a, --all            Run inference on all available providers and models
```

### Examples

**List all available providers and models:**
```bash
python allendpoints.py --list
```

**Run inference with a specific provider and model:**
```bash
python allendpoints.py "What is the capital of France?" --provider ollama --model llama3.2:3b
```

**Run inference with a specific provider and its default model:**
```bash
python allendpoints.py "Explain quantum computing" --provider gemini
```

**Run inference with a custom system prompt:**
```bash
python allendpoints.py "Write a poem about AI" --provider ollama --model llama3.2:3b --system "You are a poetic assistant."
```

**Run inference on all available providers and models:**
```bash
python allendpoints.py "What is the meaning of life?" -a
```

**Run with debug output:**
```bash
python allendpoints.py "How does a nuclear reactor work?" --provider nvidia --model qwen2.5-coder-32b --debug
```

## Using as a Python Module

AllEndpoints can be imported and used as a Python module in your own projects. Here's how to use it programmatically:

### Basic Usage

```python
# Import the necessary functions from allendpoints
from allendpoints import run_inference, check_available_apis, CONFIG

# Run inference with a specific provider and model
# Always specify the model parameter explicitly
response = run_inference(
    prompt="What is the capital of France?",
    provider="ollama",
    model="llama3.2:3b",
    system_content="You are a helpful assistant."
)

print(response)

# If you want to use the default model for a provider
default_model = CONFIG["defaults"]["ollama"]
response = run_inference(
    prompt="What is quantum computing?",
    provider="ollama",
    model=default_model
)

print(response)
```

### Advanced Usage

```python
# Import more functions for advanced usage
from allendpoints import (
    run_inference,
    check_available_apis,
    get_ollama_models,
    InferenceHandler,
    CONFIG
)

# Get all available providers
available_providers = check_available_apis()
print(f"Available providers: {available_providers}")

# Get all available Ollama models
ollama_models = get_ollama_models()
print(f"Available Ollama models: {ollama_models}")

# Use a specific provider's handler directly
if "nvidia" in available_providers:
    nvidia_response = InferenceHandler.nvidia(
        prompt="Explain quantum computing",
        model="qwen/qwen2.5-coder-32b-instruct"
    )
    print(f"NVIDIA response: {nvidia_response}")

# Access the configuration
default_models = CONFIG["defaults"]
print(f"Default models: {default_models}")
```

### Batch Processing Example

```python
# Process multiple prompts with different providers
prompts = [
    "What is machine learning?",
    "Explain the theory of relativity",
    "How does a neural network work?"
]

providers = ["ollama", "gemini", "github"]

# Process each prompt with each provider
for prompt in prompts:
    for provider in providers:
        try:
            # Always specify the model parameter explicitly
            default_model = CONFIG["defaults"][provider]
            response = run_inference(prompt, provider, model=default_model)
            print(f"\nPrompt: {prompt}")
            print(f"Provider: {provider}")
            print(f"Response: {response[:100]}...")
        except Exception as e:
            print(f"Error with {provider}: {str(e)}")
```

### Integration with main.py

The allendpoints module is integrated with main.py for benchmarking LLM performance on coding tasks:

```python
# In main.py
from allendpoints import check_available_apis, run_inference

# Get available providers
available_apis = check_available_apis()

# Run inference with a specific model
response = run_inference(
    question,      # The coding problem to solve
    provider,      # The provider to use
    model_id,      # The specific model to use
    system_content # Optional system prompt
)
```

This integration allows main.py to benchmark various LLM providers and models on coding tasks using a unified interface.

## Supported Providers

### Ollama (Local)
- Runs locally on your machine
- Supports various open-source models
- No API key required, but needs Ollama installed

### HuggingFace
- Provides access to HuggingFace's Inference API
- Requires `HF_API_KEY` environment variable

### Together
- Provides access to Together AI's models
- Requires `TOGETHER_API_KEY` environment variable

### Google Gemini
- Provides access to Google's Gemini models
- Requires `GEMINI_API_KEY` environment variable

### AIQL
- Provides access to AIQL's models
- Requires `AIQL_API_KEY` environment variable

### Groq
- Provides access to Groq's models
- Requires `GROQ_API_KEY` environment variable

### NVIDIA
- Provides access to NVIDIA's models
- Requires `NVIDIA_API_KEY` environment variable

### GitHub
- Provides access to GitHub Copilot models
- Requires `GITHUB_TOKEN` environment variable

## Adding New Models

To add a new model to an existing provider, edit the `CONFIG` dictionary in the script:

```python
CONFIG = {
    "models": {
        "provider_name": {
            "model_display_name": "actual_model_id",
            # Add your new model here
            "new_model_name": "new_model_id"
        }
    }
}
```

## Troubleshooting

### API Key Issues
- Ensure your API keys are correctly set in your environment variables
- Check that the API keys have not expired
- Verify that you have the necessary permissions for the models you're trying to access

### Ollama Issues
- Ensure Ollama is installed and running
- Check that the model you're trying to use is downloaded (`ollama list`)
- If a model is not available, pull it with `ollama pull model_name`

### Connection Issues
- Check your internet connection
- Ensure that the API endpoints are not blocked by your network or firewall
- Some providers may have rate limits or usage quotas

### Model Loading
- Large models may take time to load, especially on the first run
- The script preloads Ollama models to ensure fair timing measurements
- If a model consistently fails to load, try a smaller model or a different provider

### Colored Error Messages
- Install the `colorama` package for colored error messages: `pip install colorama`