What You Can Do
The osmosis-ai CLI enables you to:
Test rubric stability - Run the same rubric against the same data multiple times to verify scoring consistency (see multiple runs )
Compare rubrics - Evaluate different rubrics side-by-side to choose the best one for your use case (see comparison )
Ready-to-use examples are available in the SDK’s examples/ folder. Let’s get started!
Installation
Access the CLI with:
5-Minute Workflow
Step 1: Create a Rubric Configuration
Create rubric_configs.yaml:
version : 1
default_score_min : 0.0
default_score_max : 1.0
rubrics :
- id : helpfulness
title : Response Helpfulness
rubric : |
Evaluate how helpful and actionable the response is.
Consider accuracy, completeness, and practicality.
model_info :
provider : openai
model : gpt-5
api_key_env : OPENAI_API_KEY
Step 2: Prepare Your Dataset
Create sample_data.jsonl:
{ "solution_str" : "Click 'Forgot Password' on the login page." , "rubric_id" : "helpfulness" }
{ "solution_str" : "Please contact support for assistance." , "rubric_id" : "helpfulness" }
Step 3: Set API Key
export OPENAI_API_KEY = "your-key-here"
Step 4: Preview Configuration
Validate your setup:
osmosis preview --path rubric_configs.yaml
osmosis preview --path sample_data.jsonl
Step 5: Run Evaluation
osmosis eval --rubric helpfulness --data sample_data.jsonl
Understanding Output
Console:
Evaluating: 100%|████████| 2/2 [00:03<00:00, 1.5s/record]
Results Summary:
Average Score: 0.85
Min Score: 0.70
Max Score: 1.00
JSON File:
Results are saved to ~/.cache/osmosis/eval_result/helpfulness/
Advanced Usage
Multiple evaluation runs
Test rubric stability by running multiple evaluations:
osmosis eval --rubric helpfulness --data sample_data.jsonl --number 3
Compare rubrics
Compare two different rubrics to find the best one:
# First, run evaluation with the baseline rubric
osmosis eval --rubric helpfulness --data sample_data.jsonl --output baseline_results.json
# Then compare with a new rubric
osmosis eval --rubric helpfulness_v2 --data sample_data.jsonl --baseline baseline_results.json
Custom output location:
osmosis eval --rubric helpfulness --data sample_data.jsonl --output ./results.json
Tips
The CLI auto-discovers rubric_configs.yaml in your current directory, data directory, or ./examples/.
Never commit API keys. Always use environment variables.
Next Steps
CLI Reference Complete CLI documentation
Quick Start Learn the Python SDK API