fix(builds): resolve 4 active defects in orchestrator build backend#12577
fix(builds): resolve 4 active defects in orchestrator build backend#12577deepshekhardas wants to merge 3 commits into
Conversation
Greptile SummaryThis PR introduces a new orchestrator build backend for functions and sites: artifact upload/download endpoints, an HMAC-authenticated event callback endpoint, a
Confidence Score: 4/5Safe to merge with awareness of one unrecoverable-state edge case in the artifact event handler. The orchestrator event handler silently returns when waitForOrchestratorBuildPath times out on the artifact event path while pendingExit is already cached with all logs applied. No subsequent event will clear it, leaving the deployment permanently stuck in a non-terminal state with no automated recovery. src/Appwrite/Platform/Modules/Functions/Workers/Builds.php — specifically the artifact-event branch of applyOrchestratorEvent around the waitForOrchestratorBuildPath silent-return and the deferred pendingExit handling. Important Files Changed
Reviews (2): Last reviewed commit: "fix: fail orchestrator build when buildP..." | Re-trigger Greptile |
| @@ -113,15 +128,31 @@ public function action( | |||
| $deployment = $dbForProject->updateDocument('deployments', $deployment->getId(), new Document([ | |||
| 'buildEndedAt' => DateTime::now(), | |||
| 'buildDuration' => $duration, | |||
| 'status' => 'canceled' | |||
| 'status' => 'canceled', | |||
| 'buildLogs' => $logs, | |||
| ])); | |||
| } | |||
There was a problem hiding this comment.
latestDeploymentStatus not updated in TransactionException retry path
This PR adds a latestDeploymentStatus sync to the happy path (lines 111–115), but the TransactionException catch block that retries the updateDocument call never mirrors that update on the parent functions document. Any cancel operation that hits a transaction conflict will leave the function's denormalized latestDeploymentStatus permanently stale — it will still report 'building' or 'processing' after the deployment is successfully set to 'canceled' via the retry. The same gap exists in Sites/Http/Deployments/Status/Update.php (lines 114–132).
| Fixes: Unable to clear maxAge after it has been set. Related to #12462 | ||
|
|
||
| ## Changes | ||
| - Injected Request object to check if maxAge was explicitly provided via array_key_exists | ||
| - When maxAge is explicitly sent (even as null), use the provided value | ||
| - When omitted, keep the existing stored value | ||
|
|
||
| ## Root Cause | ||
| The null-coalescing merge ( ?? ) made it impossible to clear maxAge once set, because both "not provided" and "explicitly null" resolved to PHP null. Added Request injection to distinguish the two cases via array_key_exists. |
There was a problem hiding this comment.
PR body draft files accidentally committed to repository
Three PR description text files (pr-body-12462.txt, pr-body-12476.txt, pr-body-12479.txt) appear to be draft PR body text that was unintentionally included in the commit. These files have no functional purpose in the codebase and should be removed before merging.
- Add BUILD_TYPE_ORCHESTRATOR_EVENT and TOKENS_RESOURCE_TYPE_DEPLOYMENT_ARTIFACTS constants - Add resource token query param support and deploymentArtifacts token handler - Add Build message event property for orchestrator callbacks - Register new Deployments module in Appwrite.php - Fix Functions/Sites deployment event conditions - Add orchestrator build submission, event handling, artifact management, and cleanup in Builds worker - Add cancelDeployment token cleanup and buildLogs preservation in Status/Update endpoints - Create Deployments module with Module.php, Services/Http.php - Create Artifacts/Build endpoints for upload and status callbacks - Create Artifacts/Source endpoint for source artifact download - Create Events/Create endpoint for orchestrator callback handling
1. Premature event emission: guard queueForEvents param setting with
=== check (Functions + Sites deployment create)
2. Missing Authorization::skip() wrappers: wrap DB fetches in
applyOrchestratorEvent, waitForOrchestratorBuildPath, and
waitForOrchestratorSiteDetectionLogs with getAuthorization()->skip()
3. Security: remove _APP_OPENSSL_KEY_V1 fallback from both
createOrchestratorBuild and Events/Create.php HMAC verification.
Now requires explicit _APP_ORCHESTRATOR_CALLBACK_SECRET.
Also fixed: null coalesce regression (?? '' was
dropped when variable was hoisted out of the closure).
…use coroutine-safe Co::sleep - Increased waitForOrchestratorBuildPath retries from 20 to 60 (15s total) - Replaced blocking \usleep() with coroutine-safe Co::sleep() in both waitForOrchestratorBuildPath and waitForOrchestratorSiteDetectionLogs - Fail deployment deterministically when buildPath is still empty after polling, instead of storing pendingExit with no recovery path - Fixes the stuck-deployment scenario: when artifact upload permanently fails, pendingExit was stored in cache and never consumed after the build exited (no more log events arrive), leaving deployments stuck in building status until the 1-hour cache TTL
a05027b to
3a65b1f
Compare
| $logs = $deployment->getAttribute('buildLogs', ''); | ||
| $previousLogs = $logs; | ||
| while (isset($chunks[(string) ($applied + 1)])) { | ||
| $applied++; | ||
| $logs .= $chunks[(string) $applied]; | ||
| unset($chunks[(string) $applied]); | ||
| } | ||
|
|
||
| $state['applied'] = $applied; | ||
| $state['chunks'] = $chunks; | ||
| $cache->save($logStateKey, $state); | ||
|
|
||
| if ($logs !== $previousLogs) { | ||
| $deployment = $dbForProject->updateDocument('deployments', $deployment->getId(), new Document([ | ||
| 'buildLogs' => $logs, | ||
| ])); | ||
| $affected = true; | ||
| } |
There was a problem hiding this comment.
Orphaned
pendingExit if buildPath polling times out while all logs are applied
When the artifact event is processed, it first calls waitForOrchestratorBuildPath (15 s poll). For pendingExit to reach the check below, the exit event must have been processed first and all logs must already be applied (applied >= finalLogSequence). If that is true and polling still times out, the function returns silently while pendingExit stays in the 3 600 s cache. No further log events will arrive (all logs already applied), so nothing else will trigger applyOrchestratorExit — the deployment is permanently stuck in its current status.
The practical window is narrow (DB propagation lag > 15 s for a write that pre-dates the message enqueue), but the absence of a fallback means recovery requires manual intervention. Calling applyOrchestratorExit with an explicit failure payload when both buildPath is missing and pendingExit is fully ready would close this gap.
Fixes 4 active defects in orchestrator build backend
Related to #12479
1. Premature event emission (P1)
Functions and Sites deployment Create endpoints now guard queueForEvents with !->isEmpty() check so events only fire when all chunks are complete.
2. Missing Authorization::skip() wrappers
New orchestrator methods (applyOrchestratorEvent, waitForOrchestratorBuildPath, waitForOrchestratorSiteDetectionLogs) now use getAuthorization()->skip() wrappers.
3. Security: callback secret conflation
Both Builds.php::createOrchestratorBuild and Deployments/Events/Create.php now require explicit _APP_ORCHESTRATOR_CALLBACK_SECRET instead of falling back to _APP_OPENSSL_KEY_V1.
4. Null coalesce regression in outputDirectory
Restored ?? '' default for outputDirectory when neither buildOutput nor outputDirectory is configured.
5. BuildPath polling fix
Fail orchestrator build when buildPath not found after polling. Use coroutine-safe Co::sleep.