feat(supervisor): project-based scheduling affinity for image cache locality#2995
feat(supervisor): project-based scheduling affinity for image cache locality#2995
Conversation
…ocality Adds optional pod affinity so pods from the same project prefer scheduling on the same node. This can help improve image cache hit rates; subsequent pods benefit from already-pulled image layers, reducing startup time. Complements the built-in ImageLocality scheduler plugin by helping during burst scheduling scenarios. Pod affinity sees scheduled pods immediately, while ImageLocality only sees images after they're fully pulled. Configuration: - `KUBERNETES_PROJECT_AFFINITY_ENABLED` - Enable/disable (default: false) - `KUBERNETES_PROJECT_AFFINITY_WEIGHT` - Scheduler weight 1-100 (default: 50) - `KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY` - Topology key (default: kubernetes.io/hostname) Uses soft (preferred) affinity so pods always schedule even if preferred node is full.
|
|
Caution Review failedThe pull request is closed. WalkthroughThe pull request adds three new environment variables to the supervisor environment schema to configure Kubernetes project-based pod affinity: an enabled flag, a weight (1–100), and a topology key. It also refactors the Kubernetes workload manager's affinity construction: replacing a single node-affinity getter with a composable approach that provides node affinity rules and optional project-based pod affinity, and combines them in a new Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/supervisor/src/env.ts`:
- Around line 115-118: KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY currently allows
empty or whitespace-only values; update its Zod schema to enforce a non-empty,
trimmed string (e.g., use z.string().trim().min(1) or .nonempty() with .trim())
and keep the same default "kubernetes.io/hostname" so invalid inputs fail fast
at startup rather than producing invalid Kubernetes pod specs.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
🧬 Code graph analysis (1)
apps/supervisor/src/env.ts (2)
apps/webapp/app/utils/boolEnv.ts (1)
BoolEnv(12-14)apps/supervisor/src/envUtil.ts (1)
BoolEnv(15-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
🔇 Additional comments (2)
apps/supervisor/src/workloadManager/kubernetes.ts (2)
122-124: LGTM: affinity is cleanly wired into the pod spec.Nice and minimal integration; optional affinity stays absent when undefined.
393-474: LGTM: affinity composition is well-factored.The separation of node vs. project pod affinity keeps the logic readable and makes the optionality explicit.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Adds optional pod affinity so pods from the same project prefer scheduling on the same node. This can help improve image cache hit rates; subsequent pods benefit from already-pulled image layers, reducing startup time.
Complements the built-in ImageLocality scheduler plugin by helping during burst scheduling scenarios. Pod affinity sees scheduled pods immediately, while ImageLocality only sees images after they're fully pulled.
Configuration:
KUBERNETES_PROJECT_AFFINITY_ENABLED- Enable/disable (default: false)KUBERNETES_PROJECT_AFFINITY_WEIGHT- Scheduler weight 1-100 (default: 50)KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY- Topology key (default: kubernetes.io/hostname)Uses soft (preferred) affinity so pods always schedule even if preferred node is full.