358 lines
11 KiB
Markdown
358 lines
11 KiB
Markdown
# AllEndpoints - Universal LLM Inference Tool
|
|
|
|
AllEndpoints is a powerful Python module for making inferences with various LLM providers through a unified interface. It supports multiple providers including Ollama (local), HuggingFace, Together, Google Gemini, AIQL, Groq, NVIDIA, and GitHub Copilot APIs.
|
|
|
|
## Table of Contents
|
|
|
|
- [Installation](#installation)
|
|
- [Environment Variables](#environment-variables)
|
|
- [Setting Up Environment Variables](#setting-up-environment-variables)
|
|
- [Linux/macOS](#linuxmacos)
|
|
- [Windows](#windows)
|
|
- [Usage](#usage)
|
|
- [Command-Line Arguments](#command-line-arguments)
|
|
- [Examples](#examples)
|
|
- [Using as a Python Module](#using-as-a-python-module)
|
|
- [Supported Providers](#supported-providers)
|
|
- [Adding New Models](#adding-new-models)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/yourusername/allendpoints.git
|
|
cd allendpoints
|
|
```
|
|
|
|
2. Install the required dependencies:
|
|
```bash
|
|
pip install ollama requests google-generativeai huggingface_hub together groq openai colorama
|
|
```
|
|
|
|
3. Install Ollama (optional, for local inference):
|
|
- [Ollama Installation Guide](https://github.com/ollama/ollama)
|
|
|
|
## Environment Variables
|
|
|
|
The script uses environment variables to store API keys for different providers. Here are the required environment variables for each provider:
|
|
|
|
| Provider | Environment Variable | Description |
|
|
|-------------|---------------------|--------------------------------------------|
|
|
| HuggingFace | `HF_API_KEY` | HuggingFace API key |
|
|
| Together | `TOGETHER_API_KEY` | Together AI API key |
|
|
| Google Gemini | `GEMINI_API_KEY` | Google AI Studio API key |
|
|
| AIQL | `AIQL_API_KEY` | AIQL API key |
|
|
| Groq | `GROQ_API_KEY` | Groq API key |
|
|
| NVIDIA | `NVIDIA_API_KEY` | NVIDIA API key |
|
|
| GitHub | `GITHUB_TOKEN` | GitHub token for Copilot API access |
|
|
|
|
### Setting Up Environment Variables
|
|
|
|
#### Linux/macOS
|
|
|
|
**Temporary (Current Session Only)**
|
|
|
|
```bash
|
|
export HF_API_KEY="your_huggingface_api_key"
|
|
export TOGETHER_API_KEY="your_together_api_key"
|
|
export GEMINI_API_KEY="your_gemini_api_key"
|
|
export AIQL_API_KEY="your_aiql_api_key"
|
|
export GROQ_API_KEY="your_groq_api_key"
|
|
export NVIDIA_API_KEY="your_nvidia_api_key"
|
|
export GITHUB_TOKEN="your_github_token"
|
|
```
|
|
|
|
**Permanent (Add to Shell Profile)**
|
|
|
|
Add the above export commands to your `~/.bashrc`, `~/.zshrc`, or `~/.profile` file:
|
|
|
|
```bash
|
|
echo 'export HF_API_KEY="your_huggingface_api_key"' >> ~/.bashrc
|
|
echo 'export TOGETHER_API_KEY="your_together_api_key"' >> ~/.bashrc
|
|
# Add other API keys similarly
|
|
```
|
|
|
|
Then reload your shell configuration:
|
|
```bash
|
|
source ~/.bashrc # or ~/.zshrc or ~/.profile
|
|
```
|
|
|
|
#### Windows
|
|
|
|
**Command Prompt (Temporary)**
|
|
|
|
```cmd
|
|
set HF_API_KEY=your_huggingface_api_key
|
|
set TOGETHER_API_KEY=your_together_api_key
|
|
set GEMINI_API_KEY=your_gemini_api_key
|
|
set AIQL_API_KEY=your_aiql_api_key
|
|
set GROQ_API_KEY=your_groq_api_key
|
|
set NVIDIA_API_KEY=your_nvidia_api_key
|
|
set GITHUB_TOKEN=your_github_token
|
|
```
|
|
|
|
**PowerShell (Temporary)**
|
|
|
|
```powershell
|
|
$env:HF_API_KEY = "your_huggingface_api_key"
|
|
$env:TOGETHER_API_KEY = "your_together_api_key"
|
|
$env:GEMINI_API_KEY = "your_gemini_api_key"
|
|
$env:AIQL_API_KEY = "your_aiql_api_key"
|
|
$env:GROQ_API_KEY = "your_groq_api_key"
|
|
$env:NVIDIA_API_KEY = "your_nvidia_api_key"
|
|
$env:GITHUB_TOKEN = "your_github_token"
|
|
```
|
|
|
|
**Permanent (System Environment Variables)**
|
|
|
|
1. Right-click on "This PC" or "My Computer" and select "Properties"
|
|
2. Click on "Advanced system settings"
|
|
3. Click on "Environment Variables"
|
|
4. Under "User variables" or "System variables", click "New"
|
|
5. Enter the variable name (e.g., `HF_API_KEY`) and its value
|
|
6. Click "OK" to save
|
|
|
|
## Usage
|
|
|
|
### Command-Line Arguments
|
|
|
|
```
|
|
usage: allendpoints.py [-h] [--provider PROVIDER] [--model MODEL] [--system SYSTEM] [--list] [--debug] [-a] [prompt]
|
|
|
|
LLM Inference Module
|
|
|
|
positional arguments:
|
|
prompt The prompt to send to the model (default: "Why is the sky blue?")
|
|
|
|
options:
|
|
-h, --help show this help message and exit
|
|
--provider PROVIDER The provider to use (ollama, hf, together, gemini, aiql, groq, nvidia, github)
|
|
--model MODEL The specific model to use
|
|
--system SYSTEM System content for chat models (default: "You are a helpful assistant.")
|
|
--list List available providers and models
|
|
--debug Enable debug output
|
|
-a, --all Run inference on all available providers and models
|
|
```
|
|
|
|
### Examples
|
|
|
|
**List all available providers and models:**
|
|
```bash
|
|
python allendpoints.py --list
|
|
```
|
|
|
|
**Run inference with a specific provider and model:**
|
|
```bash
|
|
python allendpoints.py "What is the capital of France?" --provider ollama --model llama3.2:3b
|
|
```
|
|
|
|
**Run inference with a specific provider and its default model:**
|
|
```bash
|
|
python allendpoints.py "Explain quantum computing" --provider gemini
|
|
```
|
|
|
|
**Run inference with a custom system prompt:**
|
|
```bash
|
|
python allendpoints.py "Write a poem about AI" --provider ollama --model llama3.2:3b --system "You are a poetic assistant."
|
|
```
|
|
|
|
**Run inference on all available providers and models:**
|
|
```bash
|
|
python allendpoints.py "What is the meaning of life?" -a
|
|
```
|
|
|
|
**Run with debug output:**
|
|
```bash
|
|
python allendpoints.py "How does a nuclear reactor work?" --provider nvidia --model qwen2.5-coder-32b --debug
|
|
```
|
|
|
|
## Using as a Python Module
|
|
|
|
AllEndpoints can be imported and used as a Python module in your own projects. Here's how to use it programmatically:
|
|
|
|
### Basic Usage
|
|
|
|
```python
|
|
# Import the necessary functions from allendpoints
|
|
from allendpoints import run_inference, check_available_apis, CONFIG
|
|
|
|
# Run inference with a specific provider and model
|
|
# Always specify the model parameter explicitly
|
|
response = run_inference(
|
|
prompt="What is the capital of France?",
|
|
provider="ollama",
|
|
model="llama3.2:3b",
|
|
system_content="You are a helpful assistant."
|
|
)
|
|
|
|
print(response)
|
|
|
|
# If you want to use the default model for a provider
|
|
default_model = CONFIG["defaults"]["ollama"]
|
|
response = run_inference(
|
|
prompt="What is quantum computing?",
|
|
provider="ollama",
|
|
model=default_model
|
|
)
|
|
|
|
print(response)
|
|
```
|
|
|
|
### Advanced Usage
|
|
|
|
```python
|
|
# Import more functions for advanced usage
|
|
from allendpoints import (
|
|
run_inference,
|
|
check_available_apis,
|
|
get_ollama_models,
|
|
InferenceHandler,
|
|
CONFIG
|
|
)
|
|
|
|
# Get all available providers
|
|
available_providers = check_available_apis()
|
|
print(f"Available providers: {available_providers}")
|
|
|
|
# Get all available Ollama models
|
|
ollama_models = get_ollama_models()
|
|
print(f"Available Ollama models: {ollama_models}")
|
|
|
|
# Use a specific provider's handler directly
|
|
if "nvidia" in available_providers:
|
|
nvidia_response = InferenceHandler.nvidia(
|
|
prompt="Explain quantum computing",
|
|
model="qwen/qwen2.5-coder-32b-instruct"
|
|
)
|
|
print(f"NVIDIA response: {nvidia_response}")
|
|
|
|
# Access the configuration
|
|
default_models = CONFIG["defaults"]
|
|
print(f"Default models: {default_models}")
|
|
```
|
|
|
|
### Batch Processing Example
|
|
|
|
```python
|
|
# Process multiple prompts with different providers
|
|
prompts = [
|
|
"What is machine learning?",
|
|
"Explain the theory of relativity",
|
|
"How does a neural network work?"
|
|
]
|
|
|
|
providers = ["ollama", "gemini", "github"]
|
|
|
|
# Process each prompt with each provider
|
|
for prompt in prompts:
|
|
for provider in providers:
|
|
try:
|
|
# Always specify the model parameter explicitly
|
|
default_model = CONFIG["defaults"][provider]
|
|
response = run_inference(prompt, provider, model=default_model)
|
|
print(f"\nPrompt: {prompt}")
|
|
print(f"Provider: {provider}")
|
|
print(f"Response: {response[:100]}...")
|
|
except Exception as e:
|
|
print(f"Error with {provider}: {str(e)}")
|
|
```
|
|
|
|
### Integration with main.py
|
|
|
|
The allendpoints module is integrated with main.py for benchmarking LLM performance on coding tasks:
|
|
|
|
```python
|
|
# In main.py
|
|
from allendpoints import check_available_apis, run_inference
|
|
|
|
# Get available providers
|
|
available_apis = check_available_apis()
|
|
|
|
# Run inference with a specific model
|
|
response = run_inference(
|
|
question, # The coding problem to solve
|
|
provider, # The provider to use
|
|
model_id, # The specific model to use
|
|
system_content # Optional system prompt
|
|
)
|
|
```
|
|
|
|
This integration allows main.py to benchmark various LLM providers and models on coding tasks using a unified interface.
|
|
|
|
## Supported Providers
|
|
|
|
### Ollama (Local)
|
|
- Runs locally on your machine
|
|
- Supports various open-source models
|
|
- No API key required, but needs Ollama installed
|
|
|
|
### HuggingFace
|
|
- Provides access to HuggingFace's Inference API
|
|
- Requires `HF_API_KEY` environment variable
|
|
|
|
### Together
|
|
- Provides access to Together AI's models
|
|
- Requires `TOGETHER_API_KEY` environment variable
|
|
|
|
### Google Gemini
|
|
- Provides access to Google's Gemini models
|
|
- Requires `GEMINI_API_KEY` environment variable
|
|
|
|
### AIQL
|
|
- Provides access to AIQL's models
|
|
- Requires `AIQL_API_KEY` environment variable
|
|
|
|
### Groq
|
|
- Provides access to Groq's models
|
|
- Requires `GROQ_API_KEY` environment variable
|
|
|
|
### NVIDIA
|
|
- Provides access to NVIDIA's models
|
|
- Requires `NVIDIA_API_KEY` environment variable
|
|
|
|
### GitHub
|
|
- Provides access to GitHub Copilot models
|
|
- Requires `GITHUB_TOKEN` environment variable
|
|
|
|
## Adding New Models
|
|
|
|
To add a new model to an existing provider, edit the `CONFIG` dictionary in the script:
|
|
|
|
```python
|
|
CONFIG = {
|
|
"models": {
|
|
"provider_name": {
|
|
"model_display_name": "actual_model_id",
|
|
# Add your new model here
|
|
"new_model_name": "new_model_id"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### API Key Issues
|
|
- Ensure your API keys are correctly set in your environment variables
|
|
- Check that the API keys have not expired
|
|
- Verify that you have the necessary permissions for the models you're trying to access
|
|
|
|
### Ollama Issues
|
|
- Ensure Ollama is installed and running
|
|
- Check that the model you're trying to use is downloaded (`ollama list`)
|
|
- If a model is not available, pull it with `ollama pull model_name`
|
|
|
|
### Connection Issues
|
|
- Check your internet connection
|
|
- Ensure that the API endpoints are not blocked by your network or firewall
|
|
- Some providers may have rate limits or usage quotas
|
|
|
|
### Model Loading
|
|
- Large models may take time to load, especially on the first run
|
|
- The script preloads Ollama models to ensure fair timing measurements
|
|
- If a model consistently fails to load, try a smaller model or a different provider
|
|
|
|
### Colored Error Messages
|
|
- Install the `colorama` package for colored error messages: `pip install colorama`
|