Documentation Index Fetch the complete documentation index at: https://docs.gulp.ai/llms.txt
Use this file to discover all available pages before exploring further.
This guide covers best practices for maintaining your Osmosis-synced repository and troubleshooting common issues.
Documentation
Write Clear Docstrings
All functions should have comprehensive docstrings:
@mcp.tool ()
def fetch_user_data ( user_id : str , include_history : bool = False ) -> dict :
"""
Fetch user profile data from the database.
This tool retrieves comprehensive user information including
profile details and optionally their activity history.
Args:
user_id: Unique identifier for the user (UUID format)
include_history: Whether to include activity logs (default: False)
Returns:
Dictionary with keys:
- id: User identifier
- name: Full name
- email: Email address
- history: Activity logs (if include_history=True)
Raises:
ValueError: If user_id format is invalid
LookupError: If user_id not found in database
Example:
>>> fetch_user_data("123e4567-e89b-12d3-a456-426614174000")
{'id': '123e...', 'name': 'John Doe', 'email': 'john@example.com'}
"""
# Implementation
pass
Include Type Hints
Type hints improve IDE support and validation:
from typing import Optional, Union, List, Dict
@osmosis_reward
def evaluate_response (
solution_str : str ,
ground_truth : str ,
extra_info : Optional[Dict[ str , any ]] = None ,
** kwargs # Required - do not omit!
) -> float :
"""Type hints make the function signature clear"""
pass
Clearly specify input/output formats:
@osmosis_reward
def json_match_reward (
solution_str : str ,
ground_truth : str ,
extra_info : dict = None ,
** kwargs
) -> float :
"""
Compare JSON outputs for structural matching.
Expected format for solution_str and ground_truth:
{
"answer": "the answer text",
"confidence": 0.95,
"sources": ["source1", "source2"]
}
Returns 1.0 for perfect match, 0.0 for no match.
Partial credit given for matching some fields.
"""
pass
Testing
Write Unit Tests
Create comprehensive tests for your functions:
# tests/test_reward_functions.py
import pytest
from reward_fn.compute_reward import numbers_match_reward
def test_exact_match ():
"""Test exact numerical match"""
score = numbers_match_reward( "#### 42" , "42" )
assert score == 1.0
def test_close_match ():
"""Test near-match within epsilon"""
score = numbers_match_reward( "#### 42.0000001" , "42" )
assert score == 1.0
def test_mismatch ():
"""Test completely different values"""
score = numbers_match_reward( "#### 100" , "42" )
assert score == 0.0
def test_invalid_format ():
"""Test handling of invalid input format"""
score = numbers_match_reward( "no number here" , "42" )
assert score == 0.0
def test_missing_solution ():
"""Test handling of empty solution"""
score = numbers_match_reward( "" , "42" )
assert score == 0.0
@pytest.mark.parametrize ( "solution,ground_truth,expected" , [
( "#### 1" , "1" , 1.0 ),
( "#### 0" , "0" , 1.0 ),
( "#### -5" , "-5" , 1.0 ),
( "#### 3.14159" , "3.14159" , 1.0 ),
])
def test_various_numbers ( solution , ground_truth , expected ):
"""Test various number formats"""
score = numbers_match_reward(solution, ground_truth)
assert score == expected
Before pushing, test your MCP server:
# mcp/test/test.py
import requests
import json
def test_health_endpoint ():
"""Test that server is running"""
response = requests.get( "http://localhost:8080/health" )
assert response.status_code == 200
assert response.json()[ "status" ] == "healthy"
def test_multiply_tool ():
"""Test the multiply tool"""
# Test with FastMCP's tool calling interface
payload = {
"tool" : "multiply" ,
"arguments" : {
"first_val" : 2.5 ,
"second_val" : 4.0
}
}
response = requests.post( "http://localhost:8080/call_tool" , json = payload)
assert response.status_code == 200
result = response.json()
assert result[ "result" ] == 10.0
if __name__ == "__main__" :
test_health_endpoint()
test_multiply_tool()
print ( "All tests passed!" )
Run tests:
# Start server in background
python mcp/main.py &
SERVER_PID = $!
# Run tests
python mcp/test/test.py
# Stop server
kill $SERVER_PID
Use Test Fixtures
Create reusable test data:
# tests/conftest.py
import pytest
@pytest.fixture
def sample_solution ():
return "The answer is 42. #### 42"
@pytest.fixture
def sample_ground_truth ():
return "42"
@pytest.fixture
def sample_extra_info ():
return {
"metadata" : {
"difficulty" : "easy" ,
"category" : "arithmetic"
}
}
# tests/test_with_fixtures.py
def test_with_fixtures ( sample_solution , sample_ground_truth ):
score = numbers_match_reward(sample_solution, sample_ground_truth)
assert score == 1.0
CI/CD Integration
GitHub Actions Workflow
Create .github/workflows/test.yml:
name : Test and Validate
on :
push :
branches : [ main , develop ]
pull_request :
branches : [ main ]
jobs :
test :
runs-on : ubuntu-latest
steps :
- name : Checkout code
uses : actions/checkout@v4
- name : Set up Python
uses : actions/setup-python@v5
with :
python-version : '3.12'
- name : Install dependencies
run : |
python -m pip install --upgrade pip
pip install -e .
pip install pytest pytest-cov
- name : Run tests
run : |
pytest tests/ -v --cov=. --cov-report=term-missing
- name : Lint code
run : |
pip install ruff
ruff check .
- name : Type check
run : |
pip install mypy
mypy mcp/ reward_fn/ reward_rubric/
- name : Test MCP server
run : |
python mcp/main.py &
sleep 5
python mcp/test/test.py
pkill -f "python mcp/main.py"
Pre-commit Hooks
Create .pre-commit-config.yaml:
repos :
- repo : https://github.com/pre-commit/pre-commit-hooks
rev : v4.5.0
hooks :
- id : trailing-whitespace
- id : end-of-file-fixer
- id : check-yaml
- id : check-added-large-files
- id : check-json
- repo : https://github.com/astral-sh/ruff-pre-commit
rev : v0.1.9
hooks :
- id : ruff
args : [ --fix , --exit-non-zero-on-fix ]
- repo : https://github.com/psf/black
rev : 23.12.1
hooks :
- id : black
Install pre-commit:
pip install pre-commit
pre-commit install
Security
Never Commit Secrets
Use environment variables for sensitive data:
# Good
import os
API_KEY = os.getenv( "OPENAI_API_KEY" )
# Bad - NEVER do this
API_KEY = "sk-proj-1234567890abcdef"
Use .gitignore
Ensure .gitignore includes:
# Environment variables
.env
.env.local
.env.*.local
# API keys and secrets
secrets.json
credentials.json
*.key
*.pem
# Python
__pycache__/
*.pyc
venv/
*.egg-info/
Review Permissions Carefully
When connecting private repos:
Grant minimal required permissions
Review which repositories Osmosis can access
Use deploy keys for specific repo access
Regularly audit connected integrations
Always validate and sanitize inputs:
@mcp.tool ()
def execute_query ( query : str ) -> dict :
"""
Execute a database query (with validation)
"""
# Validate input
if not query or not isinstance (query, str ):
raise ValueError ( "Query must be a non-empty string" )
# Sanitize - prevent SQL injection
if any (keyword in query.upper() for keyword in [ 'DROP' , 'DELETE' , 'TRUNCATE' ]):
raise ValueError ( "Destructive operations not allowed" )
# Execute safely
return safe_execute(query)
Code Organization
Keep Functions Focused
Each function should have a single, clear purpose:
# Good - focused functions
@mcp.tool ()
def calculate_average ( numbers : list[ float ]) -> float :
"""Calculate arithmetic mean"""
return sum (numbers) / len (numbers)
@mcp.tool ()
def calculate_median ( numbers : list[ float ]) -> float :
"""Calculate median value"""
sorted_nums = sorted (numbers)
n = len (sorted_nums)
if n % 2 == 0 :
return (sorted_nums[n // 2 - 1 ] + sorted_nums[n // 2 ]) / 2
return sorted_nums[n // 2 ]
# Avoid - doing too much
@mcp.tool ()
def analyze_numbers ( numbers : list[ float ]) -> dict :
"""Calculate mean, median, mode, stddev, plot histogram..."""
# Too many responsibilities
pass
Use Helper Functions
Break complex logic into smaller pieces:
# Helper functions (not decorated - not exposed as tools)
def extract_number ( text : str ) -> Optional[ float ]:
"""Extract numeric value from text"""
import re
match = re.search( r ' [ -+ ] ? \d * \. ? \d + ' , text)
return float (match.group()) if match else None
def normalize_score ( raw_score : float , min_val : float , max_val : float ) -> float :
"""Normalize score to [0, 1] range"""
return (raw_score - min_val) / (max_val - min_val)
# Main function using helpers
@osmosis_reward
def text_numeric_reward (
solution_str : str ,
ground_truth : str ,
extra_info : dict = None ,
** kwargs
) -> float :
"""Reward based on numeric extraction and comparison"""
solution_num = extract_number(solution_str)
truth_num = extract_number(ground_truth)
if solution_num is None or truth_num is None :
return 0.0
difference = abs (solution_num - truth_num)
raw_score = 1.0 / ( 1.0 + difference)
return normalize_score(raw_score, 0.0 , 1.0 )
Organize by Feature
Structure your code logically:
mcp/
├── tools/
│ ├── __init__.py
│ ├── math/ # Math-related tools
│ │ ├── __init__.py
│ │ ├── arithmetic.py
│ │ └── statistics.py
│ ├── data/ # Data processing tools
│ │ ├── __init__.py
│ │ ├── fetch.py
│ │ └── transform.py
│ └── utils/ # Utility functions
│ ├── __init__.py
│ └── validation.py
Cache Expensive Operations
from functools import lru_cache
@lru_cache ( maxsize = 1000 )
def expensive_computation ( input_data : str ) -> float :
"""Cached expensive operation"""
# Complex calculation
return result
@osmosis_reward
def cached_reward ( solution_str , ground_truth , extra_info = None , ** kwargs ):
"""Uses cached helper function"""
return expensive_computation(solution_str)
Choose Appropriate Models
For rubrics:
# Example with OpenAI
MODEL = "gpt-5"
# Example with Anthropic
MODEL = "claude-sonnet-4-5"
Batch Operations When Possible
@mcp.tool ()
def batch_calculate ( numbers_list : list[list[ float ]]) -> list[ float ]:
"""Process multiple calculations in one call"""
return [ sum (numbers) / len (numbers) for numbers in numbers_list]
Monitoring and Debugging
Add Logging
import logging
logging.basicConfig( level = logging. INFO )
logger = logging.getLogger( __name__ )
@osmosis_reward
def logged_reward ( solution_str : str , ground_truth : str , extra_info : dict = None , ** kwargs ) -> float :
"""Reward function with logging"""
logger.info( f "Evaluating solution: { solution_str[: 50 ] } ..." )
try :
score = compute_score(solution_str, ground_truth)
logger.info( f "Computed score: { score } " )
return score
except Exception as e:
logger.error( f "Error computing score: { e } " )
return 0.0
Track Metrics
from collections import defaultdict
metrics = defaultdict( int )
@osmosis_reward
def instrumented_reward ( solution_str : str , ground_truth : str , extra_info : dict = None , ** kwargs ) -> float :
"""Track function calls and errors"""
metrics[ 'calls' ] += 1
try :
score = compute_score(solution_str, ground_truth)
metrics[ 'successes' ] += 1
return score
except Exception as e:
metrics[ 'errors' ] += 1
logger.error( f "Error: { e } " )
return 0.0
Troubleshooting
Sync Issues
Problem : Repository not syncing to Osmosis
Solutions :
Verify folder structure matches exactly (case-sensitive)
Check webhook settings in GitHub repository settings
Review Osmosis sync logs for specific errors
Ensure pyproject.toml includes all dependencies
Validate decorators are spelled correctly
Problem : MCP tools not appearing in Osmosis
Solutions :
Confirm @mcp.tool() decorator is present
Check tools are exported in mcp/tools/__init__.py:
from .math import multiply
__all__ = [ 'multiply' ]
Verify type hints exist for all parameters and return values
Ensure no syntax errors in tool files
Check Osmosis platform logs for import errors
Reward Function Issues
Problem : Reward functions returning unexpected scores
Solutions :
Test locally with sample inputs
Add print statements or logging
Verify input format matches expectations
Check error handling catches all edge cases
Ensure return type is float
Rubric Evaluation Issues
Problem : Rubric scores inconsistent or errors
Solutions :
Verify API key is set correctly
Check API key has sufficient credits/quota
Test with simpler rubric first
Add error handling around evaluate_rubric call
Use return_details=True to see evaluation reasoning
Verify model name is correct for provider
Import Errors
Problem : ModuleNotFoundError or import failures
Solutions :
Ensure all directories have __init__.py files
Verify imports use correct paths
Check dependencies are installed: pip install -e .
Use absolute imports from package root
Verify virtual environment is activated
Next Steps
Example Repository Study the complete reference implementation
Python SDK Learn more about the Python SDK
Contact Support Get help from the Osmosis team
Setup Guide Review the setup process