Grand Diomande Research · Full HTML Reader
CognitiveTwin V3 Evaluation Report
| Score Type | Average | |------------|---------| | Policy Compliance | 1.00 | | Format Adherence | 0.93 | | Content Quality | 0.65 |
Full Public Reader
CognitiveTwin V3 Evaluation Report
Generated: 2025-12-31 18:48:27 UTC
Model: mock
Summary
| Metric | Value |
|---|---|
| Total Tests | 14 |
| Passed | 13 |
| Failed | 1 |
| Pass Rate | **92.9 |
Scores
| Score Type | Average |
|---|---|
| Policy Compliance | 1.00 |
| Format Adherence | 0.93 |
| Content Quality | 0.65 |
Priority Breakdown
| Priority | Pass Rate |
|---|---|
| Critical | 100.0 |
| High | 87.5 |
Performance
- Average Latency: 0ms
Failures by Category
- format_adherence: 1 failures
Failed Tests
fc_002_json_format
Category: format_adherence
Priority: high
Failures:
Scores:
- Policy: 1.00
- Format: 0.00
- Content: 0.50
Response (truncated):
The requested task has been completed. The implementation follows best practices and includes proper error handling.---
Passed Tests
- ✓ qp_001_clear_directive (critical) - 0ms
- ✓ qp_002_implementation (critical) - 0ms
- ✓ qp_003_no_option_dump (high) - 0ms
- ✓ qp_005_no_let_me_know (high) - 0ms
- ✓ fc_001_no_bullets (high) - 0ms
- ✓ fc_003_no_omit (critical) - 0ms
- ✓ om_001_preserve_all (critical) - 0ms
- ✓ om_002_no_placeholders (high) - 0ms
- ✓ ha_001_stop_asking (critical) - 0ms
- ✓ ha_002_full_content (critical) - 0ms
- ✓ ha_003_just_do_it (high) - 0ms
- ✓ ec_001_multi_requirement (high) - 0ms
- ✓ ec_004_long_code (high) - 0ms
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/core/_recovered/retrieval/cc-rag-plus-plus/eval_results_test/evaluation_report.md
Detected Structure
Method · Evaluation