Grand Diomande Research · Full HTML Reader

Deployment Guide

This guide covers deploying TrajectoryOS to production. We'll use **Fly.io** as the primary example, but principles apply to other platforms (AWS, Railway, etc.).

Agents That Account for Themselves research note experiment writeup candidate score 28 .md

Full Public Reader

Deployment Guide

Overview

This guide covers deploying TrajectoryOS to production. We'll use Fly.io as the primary example, but principles apply to other platforms (AWS, Railway, etc.).

Architecture

Production Stack:
- Web Dashboard (Next.js) → Fly.io
- API Gateway → Fly.io
- Trajectory Core → Fly.io
- Python Models → Fly.io (separate apps)
- PostgreSQL → Fly.io Postgres
- Redis → Upstash
- Vector Store → Pinecone

---

Prerequisites

1. Fly.io Account: Sign up at https://fly.io
2. flyctl CLI: Install via `curl -L https://fly.io/install.sh | sh`
3. Docker: For building containers
4. Environment Variables: Prepare production secrets

---

Step 1: Prepare Dockerfiles

Trajectory Core

File: `services/trajectory-core/Dockerfile`

dockerfile
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files
COPY package.json pnpm-lock.yaml ./
COPY services/trajectory-core/package.json ./services/trajectory-core/

# Install dependencies
RUN npm install -g pnpm
RUN pnpm install --frozen-lockfile

# Copy source
COPY services/trajectory-core ./services/trajectory-core
COPY prisma ./prisma

# Generate Prisma client
RUN cd services/trajectory-core && pnpm prisma generate

# Build
RUN cd services/trajectory-core && pnpm build

# Production image
FROM node:18-alpine

WORKDIR /app

RUN npm install -g pnpm

COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/services/trajectory-core/dist ./dist
COPY --from=builder /app/services/trajectory-core/package.json ./
COPY --from=builder /app/prisma ./prisma

# Run migrations on start
COPY services/trajectory-core/docker-entrypoint.sh ./
RUN chmod +x docker-entrypoint.sh

EXPOSE 3003

CMD ["./docker-entrypoint.sh"]

Entrypoint: `services/trajectory-core/docker-entrypoint.sh`

bash
#!/bin/sh
set -e

# Run migrations
pnpm prisma migrate deploy

# Start server
node dist/index.js

Web Dashboard

File: `apps/web-dashboard/Dockerfile`

dockerfile
FROM node:18-alpine AS builder

WORKDIR /app

COPY apps/web-dashboard/package.json apps/web-dashboard/package-lock.json ./
RUN npm install

COPY apps/web-dashboard ./
RUN npm run build

FROM node:18-alpine

WORKDIR /app

COPY --from=builder /app/.next ./.next
COPY --from=builder /app/public ./public
COPY --from=builder /app/package.json ./
COPY --from=builder /app/node_modules ./node_modules

EXPOSE 3000

CMD ["npm", "start"]

Python Model (Example: Skill Graph)

File: `models/skill_graph/Dockerfile`

dockerfile
FROM python:3.10-slim

WORKDIR /app

COPY models/skill_graph/requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY models/skill_graph ./

EXPOSE 5001

CMD ["uvicorn", "main:app", "--host", "[ip]", "--port", "5001"]

---

Step 2: Create Fly.io Apps

Initialize Trajectory Core

bash
cd services/trajectory-core
fly launch --name trajectoryos-core --region sjc --no-deploy

# Edit fly.toml as needed

fly.toml:

toml
app = "trajectoryos-core"
primary_region = "sjc"

[build]
  dockerfile = "Dockerfile"

[env]
  PORT = "3003"
  NODE_ENV = "production"

[[services]]
  internal_port = 3003
  protocol = "tcp"

  [[services.ports]]
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

[[ services.http_checks]]
  interval = "10s"
  timeout = "2s"
  grace_period = "5s"
  method = "GET"
  path = "/health"

Initialize Web Dashboard

bash
cd apps/web-dashboard
fly launch --name trajectoryos-web --region sjc --no-deploy

Initialize Python Models

bash
cd models/skill_graph
fly launch --name trajectoryos-skill-model --region sjc --no-deploy

# Repeat for other models

---

Step 3: Set Up PostgreSQL

bash
# Create Postgres cluster
fly postgres create --name trajectoryos-db --region sjc

# Attach to trajectory-core
fly postgres attach trajectoryos-db --app trajectoryos-core

# This sets DATABASE_URL automatically

---

Step 4: Set Up Redis

Using Upstash (serverless Redis):

1. Go to https://upstash.com
2. Create Redis database
3. Copy connection URL

bash
# Set secret
fly secrets set REDIS_URL="redis://..." --app trajectoryos-core

---

Step 5: Configure Secrets

bash
# Set environment variables
fly secrets set \
  OPENAI_API_KEY="sk-..." \
  JWT_SECRET="your-secret-key" \
  --app trajectoryos-core

# For dashboard
fly secrets set \
  NEXT_PUBLIC_API_URL="https://trajectoryos-core.fly.dev" \
  --app trajectoryos-web

---

Step 6: Deploy

bash
# Deploy trajectory-core
cd services/trajectory-core
fly deploy

# Deploy web dashboard
cd apps/web-dashboard
fly deploy

# Deploy Python models
cd models/skill_graph
fly deploy --app trajectoryos-skill-model

---

Step 7: Run Migrations

Migrations run automatically via `docker-entrypoint.sh`, but you can also run manually:

bash
fly ssh console --app trajectoryos-core
> cd /app
> pnpm prisma migrate deploy

---

Step 8: Verify Deployment

bash
# Check health
curl https://trajectoryos-core.fly.dev/health

# Check logs
fly logs --app trajectoryos-core

---

Monitoring

Logs

bash
# Real-time logs
fly logs --app trajectoryos-core

# Historical logs
fly logs --app trajectoryos-core --since 1h

Metrics

Fly.io provides built-in metrics:

bash
fly dashboard --app trajectoryos-core

Custom Metrics (Prometheus)

Add Prometheus exporter to your apps:

typescript
// services/trajectory-core/src/metrics.ts
import promClient from 'prom-client';

export const register = new promClient.Registry();

export const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', '  route', 'status_code'],
  registers: [register]
});

// Expose /metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

---

Scaling

Horizontal Scaling

bash
# Scale to 3 instances
fly scale count 3 --app trajectoryos-core

# Auto-scaling
fly autoscale set min=2 max=10 --app trajectoryos-core

Vertical Scaling

bash
# Increase CPU/RAM
fly scale vm shared-cpu-2x --app trajectoryos-core
fly scale memory 1024 --app trajectoryos-core

---

CI/CD with GitHub Actions

.github/workflows/deploy.yml:

yaml
name: Deploy to Fly.io

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Fly
        uses: superfly/flyctl-actions/setup-flyctl@master

      - name: Deploy Trajectory Core
        run: |
          cd services/trajectory-core
          flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

      - name: Deploy Web Dashboard
        run: |
          cd apps/web-dashboard
          flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

---

Backup & Recovery

Database Backups

Fly.io Postgres automatically backs up daily. To manually backup:

bash
# Create snapshot
fly postgres db backup --app trajectoryos-db

# Restore from snapshot
fly postgres db restore --app trajectoryos-db <snapshot-id>

Application State

For critical data:

bash
# Export user data
fly ssh console --app trajectoryos-core
> pnpm prisma db execute --sql "COPY users TO '/tmp/users.csv' CSV HEADER"
> scp /tmp/users.csv local-backup/

---

Rollback

Quick Rollback

bash
# List recent releases
fly releases --app trajectoryos-core

# Rollback to previous version
fly releases rollback v42 --app trajectoryos-core

Database Rollback

If a migration fails:

bash
# SSH into app
fly ssh console --app trajectoryos-core

# Rollback migration
pnpm prisma migrate resolve --rolled-back 20251126_migration_name

---

Custom Domain

bash
# Add domain
fly certs create trajectoryos.com --app trajectoryos-web

# Verify DNS
fly certs show trajectoryos.com --app trajectoryos-web

Add DNS records:

A     @      66.241.124.x
AAAA  @      2a09:8280:1::x:x

---

Alternative Platforms

Railway

Similar workflow:
1. Connect GitHub repo
2. Railway auto-detects Dockerfiles
3. Set environment variables
4. Deploy

AWS (ECS + RDS)

More complex but highly scalable:
1. Push Docker images to ECR
2. Create ECS service definitions
3. Set up Application Load Balancer
4. Configure auto-scaling

DigitalOcean App Platform

Simpler alternative:
1. Connect repo
2. Select services to deploy
3. Configure build/run commands
4. Deploy

---

Troubleshooting

App Won't Start

bash
# Check logs
fly logs --app trajectoryos-core

# SSH into container
fly ssh console --app trajectoryos-core

# Inspect environment
> env | grep DATABASE_URL

Database Connection Errors

bash
# Verify DATABASE_URL secret
fly secrets list --app trajectoryos-core

# Test connection
fly ssh console --app trajectoryos-core
> pnpm prisma db execute --sql "SELECT 1"

High Memory Usage

bash
# Check current usage
fly status --app trajectoryos-core

# Scale up if needed
fly scale memory 2048 --app trajectoryos-core

---

Cost Optimization

Fly.io Pricing (estimate):
- Shared CPU (256MB): ~$2/month per instance
- PostgreSQL (1GB): ~$5/month
- Bandwidth: Usually free within limits

Tips:
- Use shared CPU for non-critical services
- Scale down dev/staging environments when not in use
- Enable auto-scaling to match demand

---

Next Steps: Monitoring and observability documentation coming soon. For now, see [Architecture Overview](../architecture/overview.md) for system design.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/backend/cc-trajectory/docs/ops/deployment.md

Detected Structure

Method · Evaluation · Figures · Code Anchors · Architecture