Skip to content

fix(builds): resolve 4 active defects in orchestrator build backend#12577

Open
deepshekhardas wants to merge 3 commits into
appwrite:mainfrom
deepshekhardas:fix/12489-orchestrator-defects
Open

fix(builds): resolve 4 active defects in orchestrator build backend#12577
deepshekhardas wants to merge 3 commits into
appwrite:mainfrom
deepshekhardas:fix/12489-orchestrator-defects

Conversation

@deepshekhardas

Copy link
Copy Markdown

Fixes 4 active defects in orchestrator build backend

Related to #12479

1. Premature event emission (P1)

Functions and Sites deployment Create endpoints now guard queueForEvents with !->isEmpty() check so events only fire when all chunks are complete.

2. Missing Authorization::skip() wrappers

New orchestrator methods (applyOrchestratorEvent, waitForOrchestratorBuildPath, waitForOrchestratorSiteDetectionLogs) now use getAuthorization()->skip() wrappers.

3. Security: callback secret conflation

Both Builds.php::createOrchestratorBuild and Deployments/Events/Create.php now require explicit _APP_ORCHESTRATOR_CALLBACK_SECRET instead of falling back to _APP_OPENSSL_KEY_V1.

4. Null coalesce regression in outputDirectory

Restored ?? '' default for outputDirectory when neither buildOutput nor outputDirectory is configured.

5. BuildPath polling fix

Fail orchestrator build when buildPath not found after polling. Use coroutine-safe Co::sleep.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a new orchestrator build backend for functions and sites: artifact upload/download endpoints, an HMAC-authenticated event callback endpoint, a Deployments module, and a large worker path in Builds.php that handles sequenced log events, artifact finalization, and deployment completion. It also corrects several previously identified defects (premature event emission, missing Authorization::skip() wrappers, callback-secret conflation, and a null-coalesce regression).

  • New Deployments module adds PUT /v1/deployments/:id/artifacts/build, GET /v1/deployments/:id/artifacts/source, and POST /v1/deployments/:id/events; the artifact endpoints are gated by scoped resource tokens and the event endpoint by HMAC signature verification against an explicit _APP_ORCHESTRATOR_CALLBACK_SECRET.
  • Builds.php worker gains BUILD_TYPE_ORCHESTRATOR_EVENT handling with out-of-order log reassembly, pendingExit deferred-finalization, build-path polling, site adapter detection, and token lifecycle management; Co::sleep replaces blocking sleep for Swoole coroutine safety.
  • Cancel flows (Functions and Sites Status/Update) now append a cancel log line, sync latestDeploymentStatus, clean up resource tokens, and dispatch to the orchestrator client or legacy executor based on _APP_BUILDS_BACKEND.

Confidence Score: 4/5

Safe to merge with awareness of one unrecoverable-state edge case in the artifact event handler.

The orchestrator event handler silently returns when waitForOrchestratorBuildPath times out on the artifact event path while pendingExit is already cached with all logs applied. No subsequent event will clear it, leaving the deployment permanently stuck in a non-terminal state with no automated recovery.

src/Appwrite/Platform/Modules/Functions/Workers/Builds.php — specifically the artifact-event branch of applyOrchestratorEvent around the waitForOrchestratorBuildPath silent-return and the deferred pendingExit handling.

Important Files Changed

Filename Overview
src/Appwrite/Platform/Modules/Functions/Workers/Builds.php Major addition: orchestrator build path (createOrchestratorBuild, applyOrchestratorEvent, applyOrchestratorExit, completeOrchestratorDeployment). Auth skips, Co::sleep, and token cleanup are properly handled. One minor forEach/foreach naming inconsistency found.
src/Appwrite/Platform/Modules/Deployments/Http/Deployments/Events/Create.php New orchestrator callback endpoint. Correctly requires a non-empty _APP_ORCHESTRATOR_CALLBACK_SECRET, verifies HMAC signature before any processing, and fetches deployment with auth skip.
src/Appwrite/Platform/Modules/Deployments/Http/Deployments/Artifacts/Build/Action.php Abstract base class for chunked build artifact upload. Uses locking, auth skips, and correctly enqueues BUILD_TYPE_ORCHESTRATOR_EVENT only when all chunks are finalized.
src/Appwrite/Platform/Modules/Deployments/Http/Deployments/Artifacts/Build/Update.php New PUT endpoint for build artifact upload. Validates resource token scope (deploymentId, purpose='build'), resource existence, and delegates to uploadBuildArtifact.
src/Appwrite/Platform/Modules/Deployments/Http/Deployments/Artifacts/Source/Get.php New GET endpoint for source artifact download. Validates resource token (purpose='source'), streams file in chunks when over buffer limit.
app/init/resources/request.php Switches token resolution from getParam to getQuery (URL-only), adds TOKENS_RESOURCE_TYPE_DEPLOYMENT_ARTIFACTS case that decodes the colon-delimited resourceId/resourceInternalId into structured attributes.
src/Appwrite/Platform/Modules/Functions/Http/Deployments/Status/Update.php Cancel flow now appends a log message, updates latestDeploymentStatus on the happy path, cleans up resource tokens, and dispatches to orchestrator or executor depending on backend.
src/Appwrite/Platform/Modules/Sites/Http/Deployments/Status/Update.php Mirrors Functions/Status/Update changes: cancel log, latestDeploymentStatus sync, resource token cleanup, orchestrator/executor dispatch.
src/Appwrite/Platform/Modules/Functions/Http/Deployments/Create.php Added !$deployment->isEmpty() guard on the event queue so queueForEvents only fires when all upload chunks are complete.
src/Appwrite/Platform/Modules/Sites/Http/Deployments/Create.php Same !$deployment->isEmpty() guard added to event queue as in Functions/Create.
app/init/constants.php Added BUILD_TYPE_ORCHESTRATOR_EVENT and TOKENS_RESOURCE_TYPE_DEPLOYMENT_ARTIFACTS constants.
src/Appwrite/Event/Message/Build.php Added event array field to Build message for carrying orchestrator callback payloads.

Reviews (2): Last reviewed commit: "fix: fail orchestrator build when buildP..." | Re-trigger Greptile

Comment on lines 116 to 134
@@ -113,15 +128,31 @@ public function action(
$deployment = $dbForProject->updateDocument('deployments', $deployment->getId(), new Document([
'buildEndedAt' => DateTime::now(),
'buildDuration' => $duration,
'status' => 'canceled'
'status' => 'canceled',
'buildLogs' => $logs,
]));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 latestDeploymentStatus not updated in TransactionException retry path

This PR adds a latestDeploymentStatus sync to the happy path (lines 111–115), but the TransactionException catch block that retries the updateDocument call never mirrors that update on the parent functions document. Any cancel operation that hits a transaction conflict will leave the function's denormalized latestDeploymentStatus permanently stale — it will still report 'building' or 'processing' after the deployment is successfully set to 'canceled' via the retry. The same gap exists in Sites/Http/Deployments/Status/Update.php (lines 114–132).

Comment thread pr-body-12462.txt
Comment on lines +1 to +9
Fixes: Unable to clear maxAge after it has been set. Related to #12462

## Changes
- Injected Request object to check if maxAge was explicitly provided via array_key_exists
- When maxAge is explicitly sent (even as null), use the provided value
- When omitted, keep the existing stored value

## Root Cause
The null-coalescing merge ( ?? ) made it impossible to clear maxAge once set, because both "not provided" and "explicitly null" resolved to PHP null. Added Request injection to distinguish the two cases via array_key_exists.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PR body draft files accidentally committed to repository

Three PR description text files (pr-body-12462.txt, pr-body-12476.txt, pr-body-12479.txt) appear to be draft PR body text that was unintentionally included in the commit. These files have no functional purpose in the codebase and should be removed before merging.

deepshekhardas added 3 commits June 13, 2026 15:37
- Add BUILD_TYPE_ORCHESTRATOR_EVENT and TOKENS_RESOURCE_TYPE_DEPLOYMENT_ARTIFACTS constants
- Add resource token query param support and deploymentArtifacts token handler
- Add Build message event property for orchestrator callbacks
- Register new Deployments module in Appwrite.php
- Fix Functions/Sites deployment event conditions
- Add orchestrator build submission, event handling, artifact management, and cleanup in Builds worker
- Add cancelDeployment token cleanup and buildLogs preservation in Status/Update endpoints
- Create Deployments module with Module.php, Services/Http.php
- Create Artifacts/Build endpoints for upload and status callbacks
- Create Artifacts/Source endpoint for source artifact download
- Create Events/Create endpoint for orchestrator callback handling
1. Premature event emission: guard queueForEvents param setting with
    ===  check (Functions + Sites deployment create)

2. Missing Authorization::skip() wrappers: wrap DB fetches in
   applyOrchestratorEvent, waitForOrchestratorBuildPath, and
   waitForOrchestratorSiteDetectionLogs with getAuthorization()->skip()

3. Security: remove _APP_OPENSSL_KEY_V1 fallback from both
   createOrchestratorBuild and Events/Create.php HMAC verification.
   Now requires explicit _APP_ORCHESTRATOR_CALLBACK_SECRET.

Also fixed:  null coalesce regression (?? '' was
dropped when variable was hoisted out of the closure).
…use coroutine-safe Co::sleep

- Increased waitForOrchestratorBuildPath retries from 20 to 60 (15s total)
- Replaced blocking \usleep() with coroutine-safe Co::sleep() in both
  waitForOrchestratorBuildPath and waitForOrchestratorSiteDetectionLogs
- Fail deployment deterministically when buildPath is still empty after
  polling, instead of storing pendingExit with no recovery path
- Fixes the stuck-deployment scenario: when artifact upload permanently
  fails, pendingExit was stored in cache and never consumed after the
  build exited (no more log events arrive), leaving deployments stuck
  in building status until the 1-hour cache TTL
@deepshekhardas deepshekhardas force-pushed the fix/12489-orchestrator-defects branch from a05027b to 3a65b1f Compare June 13, 2026 10:07
Comment on lines +1563 to +1580
$logs = $deployment->getAttribute('buildLogs', '');
$previousLogs = $logs;
while (isset($chunks[(string) ($applied + 1)])) {
$applied++;
$logs .= $chunks[(string) $applied];
unset($chunks[(string) $applied]);
}

$state['applied'] = $applied;
$state['chunks'] = $chunks;
$cache->save($logStateKey, $state);

if ($logs !== $previousLogs) {
$deployment = $dbForProject->updateDocument('deployments', $deployment->getId(), new Document([
'buildLogs' => $logs,
]));
$affected = true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Orphaned pendingExit if buildPath polling times out while all logs are applied

When the artifact event is processed, it first calls waitForOrchestratorBuildPath (15 s poll). For pendingExit to reach the check below, the exit event must have been processed first and all logs must already be applied (applied >= finalLogSequence). If that is true and polling still times out, the function returns silently while pendingExit stays in the 3 600 s cache. No further log events will arrive (all logs already applied), so nothing else will trigger applyOrchestratorExit — the deployment is permanently stuck in its current status.

The practical window is narrow (DB propagation lag > 15 s for a write that pre-dates the message enqueue), but the absence of a fallback means recovery requires manual intervention. Calling applyOrchestratorExit with an explicit failure payload when both buildPath is missing and pendingExit is fully ready would close this gap.

@ChiragAgg5k ChiragAgg5k changed the base branch from 1.9.x to main June 23, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant