Back to corpus
proposalexperiment writeup candidatescore 24

Engine Module Refactoring Summary

**New Structure**: ``` core/ ├── __init__.py # Exports ├── similarity.py # Unified similarity calculations ├── validators.py # Unified validation logic ├── dataframe_ops.py # Unified DataFrame operations ├── embedding_utils.py # Unified embedding utilities └── filters.py # Unified filter system ```

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

## Overview Comprehensive refactoring of the `dlm/engine` module to eliminate redundancy, improve modularity, and enhance maintainability. Created unified core components that all engine implementations can use. **Benefits**: - Single source of truth for common operations - Reduced code duplication by ~50% - Consistent behavior across all engine components - Easier to optimize and maintain **Before**: Similarity calculation code duplicated in: - `engine.py` - `calculate_similarity()`, `calculate_cross_entropy_loss()` - `embedder.py` - `calculate_scores()` - `retriever.py` - Similar patterns **After**: Single `SimilarityUtils` class with: - `calculate_similarity()` - Unified cosine similarity - `cosine_similarity_batch()` - Vectorized batch operations - `calculate_scores()` - Query-corpus similarity - `compute_cross_entropy_loss()` - Cross-entropy loss **Before**: Validation code scattered across: - `retriever.py` - `_validate_data()`, `_validate_columns()`, `_validate_pair_type()` - `loader.py` - Data source validation - Multiple files - Index range validation

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.