Odysseus - Zero-downtime Docker deployments

Understand how Odysseus achieves zero-downtime deployments.

Overview

Zero-downtime deployment means your application remains available throughout the deployment process. Users never see errors or interruptions.

Odysseus achieves this through:

Rolling updates: New containers start before old ones stop
Health checks: Traffic only routes to healthy containers
Graceful draining: Existing connections complete before shutdown

Deployment sequence

Step 1: Start new container

The new container starts alongside the existing one:

[app-v1.0.0] ← Serving traffic
[app-v1.1.0] ← Starting up

Both containers run simultaneously during the transition.

Step 2: Health check

Odysseus waits for the new container to be healthy:

[app-v1.0.0] ← Serving traffic
[app-v1.1.0] ← Health check: waiting...

Health is verified via:

Docker's built-in health check (if configured)
Timeout-based readiness (default: 60 seconds)

Step 3: Add to load balancer

Once healthy, the new container is added to Caddy's upstream:

[app-v1.0.0] ← Serving traffic
[app-v1.1.0] ← Serving traffic (added to upstream)

Traffic now flows to both containers.

Step 4: Remove old from load balancer

The old container is removed from Caddy's upstream:

[app-v1.0.0] ← No new traffic, draining existing
[app-v1.1.0] ← Serving all new traffic

Step 5: Connection draining

Existing connections to the old container complete:

[app-v1.0.0] ← Finishing 3 connections...
[app-v1.1.0] ← Serving all traffic

Odysseus waits for connections to close gracefully.

Step 6: Stop old container

Once drained, the old container is stopped:

[app-v1.1.0] ← Serving all traffic

Step 7: Cleanup stale upstreams

Odysseus removes any Caddy routes pointing to stopped containers:

Cleaning up stale Caddy routes...
Removed 0 stale upstream(s)

This ensures Caddy's configuration stays clean, even if previous deployments were interrupted or containers were stopped manually.

Deployment complete with zero interruption.

Health checks

Health checks are critical for zero-downtime. Configure them properly.

Docker health check

Add to your Dockerfile:

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s \
  CMD curl -f http://localhost:3000/health || exit 1

Parameters:

--interval: Time between checks
--timeout: Time to wait for response
--start-period: Grace period for startup

Application health endpoint

Your health endpoint should verify the app is ready:

# Rails
get '/health' do
  # Check database connection
  ActiveRecord::Base.connection.execute('SELECT 1')

  # Check Redis (if used)
  Redis.current.ping

  render plain: 'OK', status: 200
rescue => e
  render plain: e.message, status: 503
end

// Express
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1')
    res.send('OK')
  } catch (e) {
    res.status(503).send(e.message)
  }
})

Caddy health check

Configure HTTP health checks in your proxy settings:

proxy:
  hosts:
    - myapp.example.com
  app_port: 3000
  healthcheck:
    path: /health
    interval: 10
    timeout: 5

Graceful shutdown

Your application must handle shutdown signals properly.

SIGTERM handling

When Odysseus stops a container, Docker sends SIGTERM. Your app should:

Stop accepting new connections
Finish processing current requests
Close database connections
Exit cleanly

Rails example

Rails handles this automatically with Puma:

# config/puma.rb
on_worker_shutdown do
  # Cleanup code here
end

Node.js example

process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully')

  // Stop accepting new connections
  server.close(() => {
    console.log('HTTP server closed')

    // Close database connections
    db.end(() => {
      console.log('Database connections closed')
      process.exit(0)
    })
  })

  // Force exit after timeout
  setTimeout(() => {
    console.error('Forced shutdown after timeout')
    process.exit(1)
  }, 30000)
})

Multi-server deployments

With multiple servers, Odysseus deploys one at a time:

servers:
  web:
    hosts:
      - app1.example.com
      - app2.example.com
      - app3.example.com

Deployment order:

1. app1: deploy → health check → add to LB → drain old → stop old
2. app2: deploy → health check → add to LB → drain old → stop old
3. app3: deploy → health check → add to LB → drain old → stop old

At any point, at least 2 servers are fully operational.

Rollback

If a deployment fails health checks, the old container remains active:

[app-v1.0.0] ← Still serving traffic
[app-v1.1.0] ← Health check failed, not added to LB

To manually rollback:

odysseus deploy --image v1.0.0

Best practices

1. Fast startup

Keep container startup time minimal:

Precompile assets in the Docker build
Use multi-stage builds
Lazy-load expensive resources

2. Reliable health checks

Health endpoints should:

Respond quickly (< 1 second)
Check critical dependencies
Return 200 when truly ready

3. Graceful shutdown

Handle SIGTERM properly:

Stop accepting connections
Finish in-flight requests
Clean up resources

4. Database migrations

Run migrations before deployment:

odysseus app exec your-server --command "rails db:migrate"
odysseus deploy --image v1.1.0

Or use a migration container that runs before web containers start.

5. Backward-compatible changes

Deploy changes that work with both old and new versions:

Add columns before using them
Support old API formats during transition
Remove old code after full rollout