allendpoints.py | ||
example.py | ||
LICENSE | ||
README.md | ||
requirements.txt |
AllEndpoints - Universal LLM Inference Tool
AllEndpoints is a powerful Python module for making inferences with various LLM providers through a unified interface. It supports multiple providers including Ollama (local), HuggingFace, Together, Google Gemini, AIQL, Groq, NVIDIA, and GitHub Copilot APIs.
Table of Contents
- Installation
- Environment Variables
- Setting Up Environment Variables
- Linux/macOS
- Windows
- Usage
- Command-Line Arguments
- Examples
- Using as a Python Module
- Supported Providers
- Adding New Models
- Troubleshooting
Installation
-
Clone the repository:
git clone https://github.com/yourusername/allendpoints.git cd allendpoints
-
Install the required dependencies:
pip install ollama requests google-generativeai huggingface_hub together groq openai colorama
-
Install Ollama (optional, for local inference):
Environment Variables
The script uses environment variables to store API keys for different providers. Here are the required environment variables for each provider:
Provider | Environment Variable | Description |
---|---|---|
HuggingFace | HF_API_KEY |
HuggingFace API key |
Together | TOGETHER_API_KEY |
Together AI API key |
Google Gemini | GEMINI_API_KEY |
Google AI Studio API key |
AIQL | AIQL_API_KEY |
AIQL API key |
Groq | GROQ_API_KEY |
Groq API key |
NVIDIA | NVIDIA_API_KEY |
NVIDIA API key |
GitHub | GITHUB_TOKEN |
GitHub token for Copilot API access |
Setting Up Environment Variables
Linux/macOS
Temporary (Current Session Only)
export HF_API_KEY="your_huggingface_api_key"
export TOGETHER_API_KEY="your_together_api_key"
export GEMINI_API_KEY="your_gemini_api_key"
export AIQL_API_KEY="your_aiql_api_key"
export GROQ_API_KEY="your_groq_api_key"
export NVIDIA_API_KEY="your_nvidia_api_key"
export GITHUB_TOKEN="your_github_token"
Permanent (Add to Shell Profile)
Add the above export commands to your ~/.bashrc
, ~/.zshrc
, or ~/.profile
file:
echo 'export HF_API_KEY="your_huggingface_api_key"' >> ~/.bashrc
echo 'export TOGETHER_API_KEY="your_together_api_key"' >> ~/.bashrc
# Add other API keys similarly
Then reload your shell configuration:
source ~/.bashrc # or ~/.zshrc or ~/.profile
Windows
Command Prompt (Temporary)
set HF_API_KEY=your_huggingface_api_key
set TOGETHER_API_KEY=your_together_api_key
set GEMINI_API_KEY=your_gemini_api_key
set AIQL_API_KEY=your_aiql_api_key
set GROQ_API_KEY=your_groq_api_key
set NVIDIA_API_KEY=your_nvidia_api_key
set GITHUB_TOKEN=your_github_token
PowerShell (Temporary)
$env:HF_API_KEY = "your_huggingface_api_key"
$env:TOGETHER_API_KEY = "your_together_api_key"
$env:GEMINI_API_KEY = "your_gemini_api_key"
$env:AIQL_API_KEY = "your_aiql_api_key"
$env:GROQ_API_KEY = "your_groq_api_key"
$env:NVIDIA_API_KEY = "your_nvidia_api_key"
$env:GITHUB_TOKEN = "your_github_token"
Permanent (System Environment Variables)
- Right-click on "This PC" or "My Computer" and select "Properties"
- Click on "Advanced system settings"
- Click on "Environment Variables"
- Under "User variables" or "System variables", click "New"
- Enter the variable name (e.g.,
HF_API_KEY
) and its value - Click "OK" to save
Usage
Command-Line Arguments
usage: allendpoints.py [-h] [--provider PROVIDER] [--model MODEL] [--system SYSTEM] [--list] [--debug] [-a] [prompt]
LLM Inference Module
positional arguments:
prompt The prompt to send to the model (default: "Why is the sky blue?")
options:
-h, --help show this help message and exit
--provider PROVIDER The provider to use (ollama, hf, together, gemini, aiql, groq, nvidia, github)
--model MODEL The specific model to use
--system SYSTEM System content for chat models (default: "You are a helpful assistant.")
--list List available providers and models
--debug Enable debug output
-a, --all Run inference on all available providers and models
Examples
List all available providers and models:
python allendpoints.py --list
Run inference with a specific provider and model:
python allendpoints.py "What is the capital of France?" --provider ollama --model llama3.2:3b
Run inference with a specific provider and its default model:
python allendpoints.py "Explain quantum computing" --provider gemini
Run inference with a custom system prompt:
python allendpoints.py "Write a poem about AI" --provider ollama --model llama3.2:3b --system "You are a poetic assistant."
Run inference on all available providers and models:
python allendpoints.py "What is the meaning of life?" -a
Run with debug output:
python allendpoints.py "How does a nuclear reactor work?" --provider nvidia --model qwen2.5-coder-32b --debug
Using as a Python Module
AllEndpoints can be imported and used as a Python module in your own projects. Here's how to use it programmatically:
Basic Usage
# Import the necessary functions from allendpoints
from allendpoints import run_inference, check_available_apis, CONFIG
# Run inference with a specific provider and model
# Always specify the model parameter explicitly
response = run_inference(
prompt="What is the capital of France?",
provider="ollama",
model="llama3.2:3b",
system_content="You are a helpful assistant."
)
print(response)
# If you want to use the default model for a provider
default_model = CONFIG["defaults"]["ollama"]
response = run_inference(
prompt="What is quantum computing?",
provider="ollama",
model=default_model
)
print(response)
Advanced Usage
# Import more functions for advanced usage
from allendpoints import (
run_inference,
check_available_apis,
get_ollama_models,
InferenceHandler,
CONFIG
)
# Get all available providers
available_providers = check_available_apis()
print(f"Available providers: {available_providers}")
# Get all available Ollama models
ollama_models = get_ollama_models()
print(f"Available Ollama models: {ollama_models}")
# Use a specific provider's handler directly
if "nvidia" in available_providers:
nvidia_response = InferenceHandler.nvidia(
prompt="Explain quantum computing",
model="qwen/qwen2.5-coder-32b-instruct"
)
print(f"NVIDIA response: {nvidia_response}")
# Access the configuration
default_models = CONFIG["defaults"]
print(f"Default models: {default_models}")
Batch Processing Example
# Process multiple prompts with different providers
prompts = [
"What is machine learning?",
"Explain the theory of relativity",
"How does a neural network work?"
]
providers = ["ollama", "gemini", "github"]
# Process each prompt with each provider
for prompt in prompts:
for provider in providers:
try:
# Always specify the model parameter explicitly
default_model = CONFIG["defaults"][provider]
response = run_inference(prompt, provider, model=default_model)
print(f"\nPrompt: {prompt}")
print(f"Provider: {provider}")
print(f"Response: {response[:100]}...")
except Exception as e:
print(f"Error with {provider}: {str(e)}")
Integration with main.py
The allendpoints module is integrated with main.py for benchmarking LLM performance on coding tasks:
# In main.py
from allendpoints import check_available_apis, run_inference
# Get available providers
available_apis = check_available_apis()
# Run inference with a specific model
response = run_inference(
question, # The coding problem to solve
provider, # The provider to use
model_id, # The specific model to use
system_content # Optional system prompt
)
This integration allows main.py to benchmark various LLM providers and models on coding tasks using a unified interface.
Supported Providers
Ollama (Local)
- Runs locally on your machine
- Supports various open-source models
- No API key required, but needs Ollama installed
HuggingFace
- Provides access to HuggingFace's Inference API
- Requires
HF_API_KEY
environment variable
Together
- Provides access to Together AI's models
- Requires
TOGETHER_API_KEY
environment variable
Google Gemini
- Provides access to Google's Gemini models
- Requires
GEMINI_API_KEY
environment variable
AIQL
- Provides access to AIQL's models
- Requires
AIQL_API_KEY
environment variable
Groq
- Provides access to Groq's models
- Requires
GROQ_API_KEY
environment variable
NVIDIA
- Provides access to NVIDIA's models
- Requires
NVIDIA_API_KEY
environment variable
GitHub
- Provides access to GitHub Copilot models
- Requires
GITHUB_TOKEN
environment variable
Adding New Models
To add a new model to an existing provider, edit the CONFIG
dictionary in the script:
CONFIG = {
"models": {
"provider_name": {
"model_display_name": "actual_model_id",
# Add your new model here
"new_model_name": "new_model_id"
}
}
}
Troubleshooting
API Key Issues
- Ensure your API keys are correctly set in your environment variables
- Check that the API keys have not expired
- Verify that you have the necessary permissions for the models you're trying to access
Ollama Issues
- Ensure Ollama is installed and running
- Check that the model you're trying to use is downloaded (
ollama list
) - If a model is not available, pull it with
ollama pull model_name
Connection Issues
- Check your internet connection
- Ensure that the API endpoints are not blocked by your network or firewall
- Some providers may have rate limits or usage quotas
Model Loading
- Large models may take time to load, especially on the first run
- The script preloads Ollama models to ensure fair timing measurements
- If a model consistently fails to load, try a smaller model or a different provider
Colored Error Messages
- Install the
colorama
package for colored error messages:pip install colorama