LLM Setup Guide

Choose your LLM backend: Local models (complete control, privacy) or AWS Bedrock (enterprise Claude with Zero Data Retention).

Choose Your Backend

Local (LM Studio, Ollama, llama.cpp, vLLM) - Your code never leaves your infrastructure. Full privacy and control.
AWS Bedrock - Enterprise Claude 4.5 with Zero Data Retention. Your data stays in your AWS, never used for training, meets compliance requirements.

Option 1: Local LLM Backends

All local backends provide complete data privacy - your code never leaves your infrastructure. Choose based on your preferences:

LM Studio - Easy GUI, great for getting started
Ollama - Simple CLI, Docker-friendly
llama.cpp - Lightweight, maximum control
vLLM - High-performance serving, best for production

LM Studio Setup

Prerequisites

System Requirements

LM Studio installed (lmstudio.ai)
Compatible model (Qwen3-30B-A3B or similar)
RAM: At least 16GB (32GB recommended for 30B models)

Setup Steps

1. Install LM Studio

Download from lmstudio.ai and install for your platform (macOS, Windows, Linux).

2. Download Model

Open LM Studio
Navigate to "Discover" tab
Search for "qwen3-30b-a3b"
Download Q4_K_M quantization (recommended for performance/quality balance)

Quantization Explained

Q4_K_M is a 4-bit quantization that reduces model size and RAM usage while maintaining excellent quality. Perfect for 30B models on consumer hardware.

3. Start Local Server

Click "Local Server" tab
Select your downloaded model
Configure server settings:
- Port: 1234 (default)
- Context Length: 20000 tokens
- Max Tokens: 8000
Click "Start Server"

You should see "Server running on http://localhost:1234" when successful.

4. Configure drep

Update your config.yaml:

llm:
  enabled: true
  endpoint: http://localhost:1234/v1
  model: qwen3-30b-a3b
  temperature: 0.2
  max_tokens: 8000

  # Rate limiting
  max_concurrent_global: 5
  requests_per_minute: 60
  max_tokens_per_minute: 100000

  # Cache configuration
  cache:
    enabled: true
    directory: ~/.cache/drep/llm
    ttl_days: 30
    max_size_gb: 10.0

  # Circuit breaker (optional)
  circuit_breaker_threshold: 5
  circuit_breaker_timeout: 60

5. Verify Setup

Test your configuration:

drep scan owner/repo --show-metrics

Expected output:

Code quality findings
Missing docstring suggestions
Metrics showing token usage
Cache hit rate > 0% on second scan

Setup Complete!

If you see LLM analysis results and token usage metrics, your setup is working correctly.

Remote LM Studio

For remote LM Studio instances (e.g., a dedicated server):

llm:
  endpoint: https://lmstudio.example.com/v1
  api_key: ${LM_STUDIO_KEY}  # If authentication enabled

Set the API key as an environment variable:

export LM_STUDIO_KEY=your-api-key-here

Model Recommendations

Choose a model based on your available RAM and performance requirements:

Model	Size	RAM Required	Speed	Quality
Qwen3-30B-A3B	30B	32GB	Medium	Excellent
Llama-3-70B	70B	64GB	Slow	Best
Mistral-7B	7B	8GB	Fast	Good

Recommended: Qwen3-30B-A3B provides the best balance of quality and performance for code review tasks.

Ollama Setup

Ollama provides simple CLI-based model management with Docker-friendly deployment:

Install Ollama from ollama.ai
Pull a model: ollama pull qwen2.5-coder:32b
Update config.yaml:

llm:
  endpoint: http://localhost:11434/v1  # Ollama OpenAI-compatible API
  model: qwen2.5-coder:32b

llama.cpp Setup

llama.cpp provides lightweight, low-level control with minimal dependencies:

Build llama.cpp with server support
Start server: ./server -m model.gguf --port 8080
Update config.yaml:

llm:
  endpoint: http://localhost:8080/v1
  model: your-model-name

Option 2: AWS Bedrock (Enterprise Claude with ZDR)

AWS Bedrock provides enterprise-grade Claude 4.5 models with Zero Data Retention (ZDR). Your data stays in your AWS infrastructure, is never used for model training, and meets strict compliance requirements.

Enterprise Benefits

Zero Data Retention - Your data is never stored or used for training
Data Sovereignty - Data stays in your AWS region and account
Compliance - Meets HIPAA, GDPR, SOC 2, and other regulatory requirements
Latest Claude Models - Claude Sonnet 4.5 and Haiku 4.5 available now
AWS Integration - Works with IAM, CloudWatch, VPC endpoints

Prerequisites

AWS account with Bedrock access
AWS CLI configured or credentials in ~/.aws/credentials
Model access enabled in AWS Bedrock console

Setup Steps

1. Enable Model Access

Go to AWS Bedrock Console (region: us-east-1 or your preferred region)
Navigate to Model Access in the left sidebar
Click Modify model access
Select Anthropic Claude models:
- Claude Sonnet 4.5
- Claude Haiku 4.5
Click Save changes
Wait for access to be granted (usually instant)

2. Configure AWS Credentials

Bedrock uses the standard AWS credentials chain. Choose one method:

Method A: AWS CLI Configuration

aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region: us-east-1
# Default output format: json

Method B: Environment Variables

export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
export AWS_DEFAULT_REGION=us-east-1

Method C: Credentials File

Create ~/.aws/credentials:

[default]
aws_access_key_id = your_access_key_id
aws_secret_access_key = your_secret_access_key
region = us-east-1

3. Configure drep

Update config.yaml to use Bedrock:

llm:
  enabled: true
  provider: bedrock  # Required for Bedrock

  bedrock:
    region: us-east-1
    model: anthropic.claude-sonnet-4-5-20250929-v1:0

  # General LLM settings
  temperature: 0.2
  max_tokens: 4000

  # Rate limiting (lower for Bedrock)
  max_concurrent_global: 3
  requests_per_minute: 30

  # Cache configuration
  cache:
    enabled: true
    ttl_days: 30

4. Verify Setup

Test your Bedrock configuration:

drep scan owner/repo --show-metrics

Expected output:

Successful LLM initialization: "LLM backend: AWS Bedrock"
Code quality findings
Metrics showing token usage
No authentication errors

Supported Bedrock Models

Model	Model ID	Use Case	Availability
Claude Sonnet 4.5	`anthropic.claude-sonnet-4-5-20250929-v1:0`	Best balance of speed, cost, and quality. Recommended for most use cases.	global, us, eu, jp
Claude Haiku 4.5	`anthropic.claude-haiku-4-5-20251001-v1:0`	Fastest response times, lower cost. Good for simple checks.	global, us, eu

Bedrock Regions

Region	Code	Sonnet 4.5	Haiku 4.5
US East	us-east-1	✅	✅
US West	us-west-2	✅	✅
EU (Frankfurt)	eu-central-1	✅	✅
Asia Pacific	ap-southeast-1	❌	❌

Region Tip

Use us-east-1 for maximum model availability. Check model availability in your region before configuring.

Bedrock Troubleshooting

AccessDeniedException

Verify model access is enabled in AWS Bedrock console
Check IAM permissions include bedrock:InvokeModel
Ensure you're in the correct AWS region

ThrottlingException

Reduce max_concurrent_global (try 2-3)
Lower requests_per_minute (try 20-30)
Consider using Haiku 4.5 for faster throughput

Invalid model ID

Verify the model ID matches exactly (case-sensitive)
Check model availability in your region
Use correct format: anthropic.claude-sonnet-4-5-20250929-v1:0

Credentials not found

Run aws configure to set up credentials
Check ~/.aws/credentials file exists
Verify AWS_ACCESS_KEY_ID environment variable is set

Troubleshooting (Local Backends)

Connection Refused

If drep cannot connect to the LLM server:

Verify LM Studio is running
Check endpoint URL matches (default: http://localhost:1234)
Test connection: curl http://localhost:1234/v1/models

Circuit Breaker is OPEN

If you see "Circuit breaker is OPEN" errors:

Wait for recovery timeout (default 60 seconds)
Check LM Studio logs for errors
Verify the model is loaded and ready
Increase circuit_breaker_timeout in config.yaml

Cache Not Working

If cache hit rate stays at 0%:

Verify cache.enabled: true in config.yaml
Check cache directory exists and is writable
Ensure commit SHA is stable (don't scan uncommitted changes)
Run drep metrics --detailed to see cache statistics

Slow Performance

To improve performance:

Enable caching (should give 80%+ cache hit rate on re-scans)
Increase max_concurrent_global (but watch RAM usage)
Use a smaller model (Mistral-7B) for faster responses
Reduce context_length in LM Studio

Need More Help?

Check the main README or create an issue on GitHub.

Next Steps

Return to Quick Start to run your first scan
Explore advanced configuration options
View examples of CI/CD integration and webhooks