Opening the Dashboard
To access the Evaluation Dashboard:- Navigate to your project in Pylar
- Click the “Eval” button in the top-right corner of the screen
- The Evaluation Dashboard opens
Dashboard Overview
The Evaluation Dashboard is organized into several sections:- Filters: Select which MCP tool to review
- Summary Metrics: High-level performance indicators
- Visual Insights: Time-series graphs showing trends
- Error Analysis: Detailed error breakdown
- Raw Logs: Complete records of all tool calls
Filters
Selecting a Tool
At the top of the dashboard, you’ll find filters to select which MCP tool you want to review. How to use:- Click the filter dropdown
- Select the MCP tool you want to analyze
- The dashboard updates to show metrics for that tool only
Use filters to focus on specific tools. This is especially useful when you have multiple tools and want to analyze them individually.
Evaluation Metrics
The dashboard displays key metrics that summarize tool performance:Total Count
What it is: The total number of times the selected MCP tool was invoked. What it tells you: Overall usage volume—how frequently agents are using this tool.Total Count includes both successful and failed invocations. It’s the baseline for all other metrics.
Success Count
What it is: How many invocations returned a valid result. What it tells you: The absolute number of successful tool calls. Higher is better.Error Count
What it is: How many invocations failed to return a result. What it tells you: The absolute number of failed tool calls. Lower is better.Success Rate
Calculation:- High success rate (90%+) = Tool is working well
- Medium success rate (70-90%) = Some issues, needs attention
- Low success rate (less than 70%) = Significant problems, needs immediate attention
Aim for success rates above 90%. If your success rate is below this threshold, investigate errors to understand what’s going wrong.
Error Rate
Calculation:- Low error rate (less than 10%) = Tool is reliable
- Medium error rate (10-30%) = Some reliability issues
- High error rate (greater than 30%) = Major problems, needs fixing
High error rates indicate problems that are affecting agent performance. Address these issues promptly to improve agent experience.
Visual Insights
The dashboard includes time-series graphs that show how metrics change over time.Calls/Success/Errors Graph
What it shows: A time-series plot displaying:- Total Calls: How many times the tool was invoked over time
- Successes: Successful invocations over time
- Errors: Failed invocations over time
- Usage trends (increasing/decreasing usage)
- Performance trends (improving/declining success rates)
- Error patterns (when errors occur most frequently)
- Look for trends: Are errors increasing or decreasing?
- Identify patterns: Do errors spike at certain times?
- Compare periods: How does current performance compare to past performance?
Success/Error Rate (%) Graph
What it shows: Displays success and error percentages as a time-series trend. What it tells you:- Performance stability over time
- Whether your tool is improving or degrading
- Correlation between changes and performance
- Monitor trends: Is success rate improving?
- Spot anomalies: Are there sudden drops in performance?
- Track improvements: Did your changes improve performance?
Use these graphs to understand not just current performance, but also trends and patterns. This helps you identify issues before they become critical.
Interpreting the Metrics
Healthy Tool Performance
A healthy tool shows:- ✅ High success rate (90%+)
- ✅ Low error rate (less than 10%)
- ✅ Stable or improving trends over time
- ✅ Consistent performance across time periods
Tool Needs Attention
A tool that needs attention shows:- ⚠️ Success rate below 90%
- ⚠️ Error rate above 10%
- ⚠️ Declining trends in graphs
- ⚠️ Inconsistent performance
Tool Needs Immediate Fix
A tool that needs immediate attention shows:- ❌ Success rate below 70%
- ❌ Error rate above 30%
- ❌ Sharp drops in performance graphs
- ❌ Frequent errors in logs
Using Filters Effectively
Analyzing Individual Tools
- Select a specific tool from the filter
- Review its metrics
- Check if it meets performance thresholds
- Investigate if it needs improvement
Comparing Tools
- Select different tools one at a time
- Compare their metrics
- Identify which tools perform best
- Learn from high-performing tools to improve others
Time Period Analysis
Use time filters (if available) to:- Compare different time periods
- See impact of tool changes
- Identify seasonal patterns
- Track improvement over time
Next Steps
Now that you understand the dashboard:- Analyzing Errors - Dive deeper into error patterns
- Understanding Query Shapes - Learn about query patterns
- Raw Logs - Explore detailed execution logs
Analyze Errors
Learn how to identify and fix errors in your tools