Kubernetes Mode
Kubernetes mode provides a controller-based deployment for running bots at scale. Each bot becomes its own Pod, leveraging Kubernetes for scheduling, health checks, and automatic restarts.
Use Kubernetes mode when running more than ~1000 bots per host, when high availability is required, or when you need K8s-native observability and management.
Architecture
The controller manager runs two reconciliation controllers that watch MongoDB for desired state and manage Kubernetes resources accordingly. The Bot Controller manages Pods for live trading bots, while the Schedule Controller manages CronJobs for scheduled bots.
Pod Structure
Kubernetes mode uses different pod structures for realtime and scheduled bots.
Realtime Bot Pods
Realtime bots use sidecars for sync and query serving, providing better observability:
- bot: Runs
runtime execute --skip-sync --skip-query-serverwhich downloads code/state from MinIO and runs the bot - sync: Runs
daemon synccontinuously, uploading state changes and logs to MinIO - query (if query entrypoint configured): Runs
runtime execute --query-server-onlyserving queries on port 9476
Scheduled Bot Pods
Scheduled bots run as single containers (no sidecars), same as Docker mode:
The single container runs runtime execute --skip-query-server which downloads code/state, starts a sync subprocess that runs alongside the bot, executes the bot, and the sync subprocess does a final upload when the bot completes.
How It Works
Realtime Bots
The Bot Controller creates Pods with sync and query sidecars. When the bot crashes, Kubernetes automatically restarts the bot container. The sidecars continue running throughout, ensuring state and logs are synced even across restarts.
The controller watches for changes via NATS events and also runs periodic reconciliation. When a bot is disabled, the controller deletes the Pod. When configuration changes, it deletes and recreates the Pod with the new settings.
Scheduled Bots
The Schedule Controller creates CronJobs rather than Pods directly. Kubernetes handles the cron scheduling, creating a new Job (and thus a new Pod) at each scheduled time.
These Pods use RestartPolicy: Never since completion is expected. The single container runs runtime execute --skip-query-server which downloads code/state, starts a sync subprocess that runs alongside the bot, executes the bot, and the sync subprocess does a final upload when the bot completes. The Job then completes and Kubernetes cleans up according to the CronJob's history limits.
Comparison with Docker Mode
Both modes use the same underlying commands (runtime execute and daemon sync) with the same logic. The difference is in how they're orchestrated:
| Aspect | Docker Mode | K8s Realtime | K8s Scheduled |
|---|---|---|---|
| Orchestrator | BotService/ScheduleService | Bot Controller | Schedule Controller |
| Sync | Subprocess | Sidecar | Subprocess |
| Query Server | Subprocess | Sidecar | N/A (ephemeral) |
| Restart | Service reconciliation | K8s restart policy | Job recreation |
| Observability | Container logs | Per-container logs | Job logs |
K8s realtime bots provide better observability - you can see the status of each container independently and view their logs separately.
Query Execution
Realtime Bot Queries
For running pods with a query entrypoint, queries are proxied to the query sidecar on port 9476:
- QueryServer (port 9477) receives HTTP request
- Resolves pod IP for the bot
- HTTP proxy to pod IP:9476 (the query sidecar)
- Response returned (~10-50ms latency)
Scheduled Bot Queries
For scheduled bots, queries create ephemeral K8s Jobs:
- K8sQueryHandler receives request
- PodGenerator creates query pod spec with
QUERY_PATHenv var - Job is created with TTL cleanup (5 minutes)
- Job pod runs, writes result to MinIO
- Handler downloads result from MinIO
- Job auto-cleans up (~1-3s latency)
Quick Start
For local development with Minikube:
cd k8s
make minikube-upThis starts Minikube with the controller and all dependencies.
Environment Variables
Required
NAMESPACE=the0 # K8s namespace
MONGO_URI=mongodb://localhost:27017 # MongoDB connection
MINIO_ENDPOINT=minio:9000 # MinIO endpoint (in-cluster)
MINIO_ACCESS_KEY=the0admin
MINIO_SECRET_KEY=the0password
MINIO_BUCKET=the0-custom-botsOptional
NATS_URL=nats://localhost:4222 # NATS for real-time events
RECONCILE_INTERVAL=30s # Controller reconciliation intervalCLI Commands
# Start Kubernetes controller
./runtime controller --namespace the0 --reconcile-interval 30sWhen to Use Kubernetes Mode
Choose Kubernetes mode when you need:
- Scale: More than 1000 bots
- High Availability: Automatic pod rescheduling on node failure
- Observability: K8s-native monitoring, logging, and metrics
- Resource Management: Fine-grained CPU/memory limits per bot
- Multi-tenancy: Namespace isolation
- Rolling Updates: Zero-downtime runtime updates