chore: language codes data migration#8367
Conversation
WalkthroughAdds a migration that canonicalizes language codes in language, survey-language, survey content, response, and contact attribute data. It also adds shared utility functions and type definitions for planning relabel and merge operations, plus Vitest coverage for canonicalization and move-planning behavior. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
âś… Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts`:
- Around line 157-163: The survey-content migration loop in the code that calls
rewriteI18nKeys() is skipping unresolved codes whenever result.changed is false,
which underreports invalid values. Update the SURVEY_CONTENT_FIELDS processing
so unresolved codes from rewriteI18nKeys() are always added to
stats.unresolvedCodes before any early exit, while keeping
stats.i18nKeysRewritten incremented only when changes were made.
In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.test.ts`:
- Around line 193-201: The current test in planSurveyLanguageMoves only checks
the repoint/delete counts for duplicate absorbed links and misses the
flag-promotion branch. Extend this regression in utils.test.ts to assert the
behavior of planSurveyLanguageMoves for the duplicate absorbed-link case where
one link is repointed and the second is deduped, and verify that the surviving
moved row inherits the promoted default and enabled flags from the deleted
absorbed row.
In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.ts`:
- Around line 204-229: `planSurveyLanguageMoves()` is not preserving the
survivor `default`/`enabled` state when the first absorbed link for a survey is
repointed, so later deduped rows can compute flags from the wrong baseline.
Update the repoint branch in `planSurveyLanguageMoves()` to record the first
repointed link in the survivor-state map (or otherwise carry its
`default`/`enabled` values forward) before any later delete-based merge logic
runs. Then keep the final `flagUpdatesBySurvey` comparison aligned with that
survivor state so `migration.ts` only emits flag updates after the survivor row
is known.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0f1b446f-a960-4bcd-86de-6a3311d58782
â›” Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
đź“’ Files selected for processing (5)
packages/database/migration/20260625104904_canonicalize_language_codes/migration.tspackages/database/migration/20260625104904_canonicalize_language_codes/types.tspackages/database/migration/20260625104904_canonicalize_language_codes/utils.test.tspackages/database/migration/20260625104904_canonicalize_language_codes/utils.tspackages/database/package.json
|



What does this PR do?
Part of ENG-1067 (standardize survey, app, and translation language tags on BCP-47).
This is the data migration that canonicalizes the language codes already stored in the database to their canonical BCP-47 form (e.g.
de→de-DE,pt→pt-BR,zh-Hans→zh-Hans-CN), using the sharednormalizeLanguageCode/LANGUAGE_CANONICAL_MAPfoundation from #8349. It runs through the existing data-migration framework (one transaction, tracked inDataMigration, run-once + idempotent).What it migrates (in order, all in one transaction)
Language.coderows — relabel each code to canonical. When a bare + region row collide on the same canonical within a workspace (@@unique([workspaceId, code])), one row survives (preferring a row already at the canonical code, else the oldest) and the rest are absorbed:SurveyLanguagelinks are repointed/deduped (composite PK@@id([languageId, surveyId])),default/enabledflags carried over, aliases preserved, absorbed rows deleted.i18nString's non-defaultlanguage keys acrosswelcomeCard,blocks,endings,metadata,surveyClosedMessage,questions. Keys that collapse to the same canonical (e.g.de+de-DE) are merged (non-empty / canonical-key value wins).Response.language— remap distinct codes (skipsNULL, the"default"sentinel, and already-canonical values).Response.contactAttributes.language(snapshot JSON) — same remap viajsonb_set, so the snapshot stays consistent withResponse.languageand the contact attribute.languageattribute (ContactAttributerows whose key islanguage) — index-friendly batched remap (resolves thelanguageattribute keys up front, thenattributeKeyId = ANY(...) AND value = ...to hit the[attributeKeyId, value]index).Design notes
runexecutes in one transaction; any error rolls back with no partial state."default"i18n sentinel preserved everywhere.Languagedump: 8 collision groups resolve as expected with 0 alias-loss cases.utils.tsand is unit-tested;migration.tsonly orchestrates the SQL.Dependencies / sequencing
How should this be tested?
pnpm --filter @formbricks/database exec vitest run migration/20260625104904_canonicalize_language_codes/utils.test.ts— coverstoCanonical, the recursive i18n key rewrite (incl. thede+de-DEmerge fixture), language-row merge planning (collision survivor selection, alias copy, per-workspace scoping), andSurveyLanguagerepoint/dedupe.pnpm --filter @formbricks/database build— confirms the migration bundles correctly with@formbricks/i18n-utilsinlined (so it resolves at runtime in bothtsxand builtnode).Languagedump and assert the expected merges resolve with 0 orphans.Note
Step 5 (contact
languageattribute) is the heaviest part on instances with largeContactAttributetables. It is index-friendly and logs per-code progress, but because the framework wraps the whole migration in a single transaction, on very large tables this is a long-running transaction — worth scheduling for an off-peak window. Splitting the high-volume backfill out into a batched out-of-transaction script is a possible follow-up if needed.Checklist
Required
pnpm buildconsole.logsAppreciated