Documentation Index Fetch the complete documentation index at: https://docs.gulp.ai/llms.txt
Use this file to discover all available pages before exploring further.
The osmosis-ai CLI provides two main commands: preview for inspecting configurations and eval for running evaluations.
Installation
The CLI is accessible via three aliases:
osmosis
osmosis-ai
osmosis_ai
Global Usage
osmosis [command] [options]
Commands
preview
Inspect and validate rubric configurations or dataset files.
Usage
osmosis preview --path < file_pat h >
Options
Option Type Required Description --pathstring Yes Path to the file to preview (YAML or JSONL)
Examples
Preview a rubric configuration:
osmosis preview --path rubric_configs.yaml
Preview a dataset:
osmosis preview --path sample_data.jsonl
Output
The command will:
Validate the file structure
Display parsed contents in a readable format
Show count summary (number of rubrics or records)
Report any validation errors
eval
Evaluate a dataset against a rubric configuration.
Usage
osmosis eval --rubric < rubric_i d > --data < data_pat h > [options]
Required Options
Option Short Type Description --rubric-rstring Rubric ID from your configuration file --data-dstring Path to JSONL dataset file
Optional Parameters
Option Short Type Default Description --config-cstring Auto-discovered Path to rubric configuration YAML --number-ninteger 1 Number of evaluation runs per record --output-ostring ~/.cache/osmosis/...Output path for results JSON --baseline-bstring None Path to baseline evaluation for comparison
Examples
Basic evaluation:
osmosis eval --rubric helpfulness --data responses.jsonl
Multiple runs for variance analysis:
osmosis eval --rubric helpfulness --data responses.jsonl --number 5
Custom output location:
osmosis eval --rubric helpfulness --data responses.jsonl --output ./results/eval_001.json
Compare against baseline:
osmosis eval --rubric helpfulness --data new_responses.jsonl --baseline ./results/baseline.json
Custom configuration file:
osmosis eval --rubric helpfulness --data responses.jsonl --config ./configs/custom_rubrics.yaml
Configuration Files
Rubric Configuration (YAML)
The rubric configuration file defines evaluation criteria and model settings.
Structure
version : 1
default_score_min : 0.0
default_score_max : 1.0
rubrics :
- id : rubric_identifier
title : Human-Readable Title
rubric : |
Your evaluation criteria here.
Can be multiple lines.
model_info :
provider : openai
model : gpt-5
api_key_env : OPENAI_API_KEY
timeout : 30
score_min : 0.0 # Optional override
score_max : 1.0 # Optional override
Required Fields
version: Configuration schema version (currently 1)
rubrics: List of rubric definitions
Rubric Definition Fields
Field Type Required Description idstring Yes Unique identifier for the rubric titlestring Yes Human-readable title rubricstring Yes Evaluation criteria in natural language model_infoobject Yes LLM provider configuration score_minfloat No Minimum score (overrides default) score_maxfloat No Maximum score (overrides default)
Model Info Fields
Field Type Required Description providerstring Yes Provider name (see Supported Providers) modelstring Yes Model identifier api_key_envstring No Environment variable name for API key timeoutinteger No Request timeout in seconds (default: 30)
Auto-Discovery
If you don’t specify --config, the CLI searches for rubric_configs.yaml in:
Same directory as the data file
Current working directory
./examples/ subdirectory
Each line in the JSONL file represents one evaluation record.
Minimal Example
{ "solution_str" : "The AI's response text to evaluate" }
Complete Example
{
"conversation_id" : "ticket-12345" ,
"rubric_id" : "helpfulness" ,
"original_input" : "How do I reset my password?" ,
"solution_str" : "Click 'Forgot Password' on the login page and follow the email instructions." ,
"ground_truth" : "Users should use the password reset link sent to their registered email." ,
"metadata" : {
"customer_tier" : "premium" ,
"category" : "account_management"
},
"score_min" : 0.0 ,
"score_max" : 10.0
}
Field Reference
Field Type Required Description solution_strstring Yes The text to be evaluated (must be non-empty) conversation_idstring No Unique identifier for this record rubric_idstring No Links to a specific rubric in config original_inputstring No Original user query/prompt for context ground_truthstring No Reference answer for comparison metadataobject No Additional context passed to evaluator extra_infoobject No Runtime configuration options score_minfloat No Override minimum score for this record score_maxfloat No Override maximum score for this record
Console Output
During evaluation, you’ll see:
Evaluating records: 100%|████████████████| 50/50 [00:45<00:00, 1.1record/s]
Overall Statistics:
Average Score: 0.847
Min Score: 0.200
Max Score: 1.000
Variance: 0.034
Std Deviation: 0.185
Success Rate: 100.0% (50/50)
Evaluation Results:
...
Results saved to:
~/.cache/osmosis/eval_result/helpfulness/rubric_eval_result_20250114_143022.json
JSON Output File
The output JSON file contains detailed results:
{
"rubric_id" : "helpfulness" ,
"timestamp" : "2025-01-14T14:30:22.123456" ,
"duration_seconds" : 45.2 ,
"total_records" : 50 ,
"successful_evaluations" : 50 ,
"failed_evaluations" : 0 ,
"statistics" : {
"average" : 0.847 ,
"min" : 0.200 ,
"max" : 1.000 ,
"variance" : 0.034 ,
"std_dev" : 0.185
},
"results" : [
{
"conversation_id" : "ticket-12345" ,
"runs" : [
{
"score" : 0.85 ,
"explanation" : "The response directly addresses the question..." ,
"raw_payload" : { ... },
"duration_ms" : 890
}
],
"aggregate_stats" : {
"average" : 0.85 ,
"variance" : 0.0
}
}
],
"model_info" : {
"provider" : "openai" ,
"model" : "gpt-5"
}
}
Supported Providers
Provider Value API Key Env Example Models OpenAI openaiOPENAI_API_KEYgpt-5 Anthropic anthropicANTHROPIC_API_KEYclaude-sonnet-4-5 Google Gemini geminiGOOGLE_API_KEYgemini-2.5-flash xAI xaiXAI_API_KEYgrok-4 OpenRouter openrouterOPENROUTER_API_KEY100+ models Cerebras cerebrasCEREBRAS_API_KEYllama3.1-405b
Provider Configuration Example
model_info :
provider : openai # Or: anthropic, gemini, xai, openrouter, cerebras
model : gpt-5 # Provider-specific model identifier
api_key_env : OPENAI_API_KEY # Environment variable name
timeout : 30 # Optional timeout in seconds
Advanced Usage
Baseline Comparison
Compare new evaluations against a baseline to detect regressions:
# Create baseline
osmosis eval --rubric helpfulness --data baseline.jsonl --output baseline.json
# Compare new data against baseline
osmosis eval --rubric helpfulness --data new_data.jsonl --baseline baseline.json
The output will include delta statistics showing improvements or regressions.
Variance Analysis
Run multiple evaluations per record to measure score consistency:
osmosis eval --rubric helpfulness --data responses.jsonl --number 10
Useful for:
Understanding rubric stability
Detecting ambiguous criteria
A/B testing different prompts
Batch Processing
Process multiple datasets:
for file in data/*.jsonl ; do
osmosis eval --rubric helpfulness --data " $file " --output "results/$( basename $file .jsonl).json"
done
Custom Cache Location
Override the default cache directory:
export OSMOSIS_CACHE_DIR = / path / to / custom / cache
osmosis eval --rubric helpfulness --data responses.jsonl
Error Handling
Common Errors
API Key Not Found
Error: API key not found for provider 'openai'
Solution: Set the environment variable:
export OPENAI_API_KEY = "your-key-here"
Rubric Not Found
Error: Rubric 'helpfulness' not found in configuration
Solution: Check your rubric_configs.yaml and ensure the rubric ID matches exactly.
Error: Invalid JSON on line 5
Solution: Validate your JSONL file. Each line must be valid JSON.
Model Not Found
Error: Model 'gpt-5' not available for provider 'openai'
Solution: Use a valid model identifier for your chosen provider.
Timeout Error
Error: Request timed out after 30 seconds
Solution: Increase the timeout in your model configuration:
Best Practices
Writing Effective Rubrics:
Be specific and measurable
Include clear criteria and examples
Test with sample data before large-scale evaluation
Dataset Preparation:
Include diverse examples with relevant metadata
Validate JSONL syntax before evaluation
Keep solution_str concise but complete
Performance Optimization:
Process datasets in batches for cost efficiency
Cost Management:
Start with small samples to test rubrics
Monitor API usage through provider dashboards
Troubleshooting
Debug Mode:
export OSMOSIS_DEBUG = 1
osmosis eval --rubric helpfulness --data responses.jsonl
Verify Installation:
Test Setup:
osmosis preview --path rubric_configs.yaml
Check Results:
ls -lh ~/.cache/osmosis/eval_result/
Next Steps
Quick Start New to the CLI? Start with the quick start guide
Decorators & API Learn about programmatic usage with Python decorators