CS Knowledge Base

pandeymangg · 2026-06-26T06:55:14Z

What does this PR do?

Part of ENG-1067 (standardize survey, app, and translation language tags on BCP-47).

This is the data migration that canonicalizes the language codes already stored in the database to their canonical BCP-47 form (e.g. de → de-DE, pt → pt-BR, zh-Hans → zh-Hans-CN), using the shared normalizeLanguageCode / LANGUAGE_CANONICAL_MAP foundation from #8349. It runs through the existing data-migration framework (one transaction, tracked in DataMigration, run-once + idempotent).

What it migrates (in order, all in one transaction)

Language.code rows — relabel each code to canonical. When a bare + region row collide on the same canonical within a workspace (@@unique([workspaceId, code])), one row survives (preferring a row already at the canonical code, else the oldest) and the rest are absorbed: SurveyLanguage links are repointed/deduped (composite PK @@id([languageId, surveyId])), default/enabled flags carried over, aliases preserved, absorbed rows deleted.
Survey content i18n keys (multi-language surveys only) — recursively rewrite every i18nString's non-default language keys across welcomeCard, blocks, endings, metadata, surveyClosedMessage, questions. Keys that collapse to the same canonical (e.g. de + de-DE) are merged (non-empty / canonical-key value wins).
Response.language — remap distinct codes (skips NULL, the "default" sentinel, and already-canonical values).
Response.contactAttributes.language (snapshot JSON) — same remap via jsonb_set, so the snapshot stays consistent with Response.language and the contact attribute.
Contact language attribute (ContactAttribute rows whose key is language) — index-friendly batched remap (resolves the language attribute keys up front, then attributeKeyId = ANY(...) AND value = ... to hit the [attributeKeyId, value] index).

Design notes

Idempotent & safe to re-run — already-canonical values are skipped; unparseable/junk codes are left untouched (never dropped) and logged at the end.
Atomic — the whole run executes in one transaction; any error rolls back with no partial state.
"default" i18n sentinel preserved everywhere.
No migration-specific backup step — disaster recovery relies on the existing DB backup policy.
The merge planner was dry-run against a production Language dump: 8 collision groups resolve as expected with 0 alias-loss cases.
Pure logic (canonicalization, recursive key rewrite, merge planning) lives in utils.ts and is unit-tested; migration.ts only orchestrates the SQL.

Dependencies / sequencing

Builds on the canonical foundation from feat: i18n - add canonical BCP-47 language map + normalizer (ENG-1067) #8349.
Code is independent, but the migration should be run after the picker/v3/runtime PRs (feat: canonical language picker + write validation #8352, fix: api/v3 - canonicalize survey language codes + legacy inbound back-compat (ENG-1067) #8355, chore: align runtime locale bundles to canonical BCP-47 + render back-compat (ENG-1067) #8357) are in the epic, so that post-migration reads and writes are all canonical (legacy inbound is still accepted via the back-compat in those PRs).

How should this be tested?

Unit tests: pnpm --filter @formbricks/database exec vitest run migration/20260625104904_canonicalize_language_codes/utils.test.ts — covers toCanonical, the recursive i18n key rewrite (incl. the de + de-DE merge fixture), language-row merge planning (collision survivor selection, alias copy, per-workspace scoping), and SurveyLanguage repoint/dedupe.
Build: pnpm --filter @formbricks/database build — confirms the migration bundles correctly with @formbricks/i18n-utils inlined (so it resolves at runtime in both tsx and built node).
Dry-run: run the merge planner against a prod Language dump and assert the expected merges resolve with 0 orphans.

Note

Step 5 (contact language attribute) is the heaviest part on instances with large ContactAttribute tables. It is index-friendly and logs per-code progress, but because the framework wraps the whole migration in a single transaction, on very large tables this is a long-running transaction — worth scheduling for an off-peak window. Splitting the high-volume backfill out into a batched out-of-transaction script is a possible follow-up if needed.

Checklist

Required

Filled out the "How to test" section in this PR
Read How we Code at Formbricks
Self-reviewed my own code
Commented on my code in hard-to-understand bits
Ran pnpm build
Checked for warnings, there are none
Removed all console.logs
Merged the latest changes from the epic onto my branch
My changes don't cause any responsiveness issues

Appreciated

If a UI change was made: Added a screen recording or screenshots to this PR
Updated the Formbricks Docs if changes were necessary

coderabbitai · 2026-06-26T07:01:42Z

Walkthrough

Adds a migration that canonicalizes language codes in language, survey-language, survey content, response, and contact attribute data. It also adds shared utility functions and type definitions for planning relabel and merge operations, plus Vitest coverage for canonicalization and move-planning behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is mostly placeholders and does not provide a real summary, issue reference, or testing details.	Replace the template placeholders with a real PR summary, link the fixed issue, and add concrete test steps and checklist details.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is conventional, concise, and clearly describes the language-code data migration.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts`:
- Around line 157-163: The survey-content migration loop in the code that calls
rewriteI18nKeys() is skipping unresolved codes whenever result.changed is false,
which underreports invalid values. Update the SURVEY_CONTENT_FIELDS processing
so unresolved codes from rewriteI18nKeys() are always added to
stats.unresolvedCodes before any early exit, while keeping
stats.i18nKeysRewritten incremented only when changes were made.

In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.test.ts`:
- Around line 193-201: The current test in planSurveyLanguageMoves only checks
the repoint/delete counts for duplicate absorbed links and misses the
flag-promotion branch. Extend this regression in utils.test.ts to assert the
behavior of planSurveyLanguageMoves for the duplicate absorbed-link case where
one link is repointed and the second is deduped, and verify that the surviving
moved row inherits the promoted default and enabled flags from the deleted
absorbed row.

In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.ts`:
- Around line 204-229: `planSurveyLanguageMoves()` is not preserving the
survivor `default`/`enabled` state when the first absorbed link for a survey is
repointed, so later deduped rows can compute flags from the wrong baseline.
Update the repoint branch in `planSurveyLanguageMoves()` to record the first
repointed link in the survivor-state map (or otherwise carry its
`default`/`enabled` values forward) before any later delete-based merge logic
runs. Then keep the final `flagUpdatesBySurvey` comparison aligned with that
survivor state so `migration.ts` only emits flag updates after the survivor row
is known.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f1b446f-a960-4bcd-86de-6a3311d58782

📥 Commits

Reviewing files that changed from the base of the PR and between 8d63873 and a966e7c.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (5)

packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts
packages/database/migration/20260625104904_canonicalize_language_codes/types.ts
packages/database/migration/20260625104904_canonicalize_language_codes/utils.test.ts
packages/database/migration/20260625104904_canonicalize_language_codes/utils.ts
packages/database/package.json

sonarqubecloud · 2026-06-26T16:12:41Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

xernobyl

LGTM

pandeymangg added 2 commits June 26, 2026 12:24

migration

347a352

removes comment

a966e7c

pandeymangg requested a review from xernobyl June 26, 2026 06:55

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

fixes feedback

89bf833

xernobyl reviewed Jun 26, 2026

View reviewed changes

Comment thread packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts Outdated

pandeymangg added 2 commits June 26, 2026 18:13

api changes for back compat

8c82823

fixes feedback

d5ea2cd

xernobyl approved these changes Jun 26, 2026

View reviewed changes

CS Knowledge Base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: language codes data migration#8367

chore: language codes data migration#8367
pandeymangg wants to merge 5 commits into
epic/language-codes-stabilizationfrom
chore/language-codes-data-migration

pandeymangg commented Jun 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 26, 2026

Uh oh!

xernobyl left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CS Knowledge Base

Uh oh!

Conversation

pandeymangg commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What it migrates (in order, all in one transaction)

Design notes

Dependencies / sequencing

How should this be tested?

Checklist

Required

Appreciated

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 26, 2026

Quality Gate passed

Uh oh!

xernobyl left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pandeymangg commented Jun 26, 2026 •

edited

Loading