Skip to content

chore: language codes data migration#8367

Open
pandeymangg wants to merge 5 commits into
epic/language-codes-stabilizationfrom
chore/language-codes-data-migration
Open

chore: language codes data migration#8367
pandeymangg wants to merge 5 commits into
epic/language-codes-stabilizationfrom
chore/language-codes-data-migration

Conversation

@pandeymangg

@pandeymangg pandeymangg commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Part of ENG-1067 (standardize survey, app, and translation language tags on BCP-47).

This is the data migration that canonicalizes the language codes already stored in the database to their canonical BCP-47 form (e.g. de → de-DE, pt → pt-BR, zh-Hans → zh-Hans-CN), using the shared normalizeLanguageCode / LANGUAGE_CANONICAL_MAP foundation from #8349. It runs through the existing data-migration framework (one transaction, tracked in DataMigration, run-once + idempotent).

What it migrates (in order, all in one transaction)

  1. Language.code rows — relabel each code to canonical. When a bare + region row collide on the same canonical within a workspace (@@unique([workspaceId, code])), one row survives (preferring a row already at the canonical code, else the oldest) and the rest are absorbed: SurveyLanguage links are repointed/deduped (composite PK @@id([languageId, surveyId])), default/enabled flags carried over, aliases preserved, absorbed rows deleted.
  2. Survey content i18n keys (multi-language surveys only) — recursively rewrite every i18nString's non-default language keys across welcomeCard, blocks, endings, metadata, surveyClosedMessage, questions. Keys that collapse to the same canonical (e.g. de + de-DE) are merged (non-empty / canonical-key value wins).
  3. Response.language — remap distinct codes (skips NULL, the "default" sentinel, and already-canonical values).
  4. Response.contactAttributes.language (snapshot JSON) — same remap via jsonb_set, so the snapshot stays consistent with Response.language and the contact attribute.
  5. Contact language attribute (ContactAttribute rows whose key is language) — index-friendly batched remap (resolves the language attribute keys up front, then attributeKeyId = ANY(...) AND value = ... to hit the [attributeKeyId, value] index).

Design notes

  • Idempotent & safe to re-run — already-canonical values are skipped; unparseable/junk codes are left untouched (never dropped) and logged at the end.
  • Atomic — the whole run executes in one transaction; any error rolls back with no partial state.
  • "default" i18n sentinel preserved everywhere.
  • No migration-specific backup step — disaster recovery relies on the existing DB backup policy.
  • The merge planner was dry-run against a production Language dump: 8 collision groups resolve as expected with 0 alias-loss cases.
  • Pure logic (canonicalization, recursive key rewrite, merge planning) lives in utils.ts and is unit-tested; migration.ts only orchestrates the SQL.

Dependencies / sequencing

How should this be tested?

  • Unit tests: pnpm --filter @formbricks/database exec vitest run migration/20260625104904_canonicalize_language_codes/utils.test.ts — covers toCanonical, the recursive i18n key rewrite (incl. the de + de-DE merge fixture), language-row merge planning (collision survivor selection, alias copy, per-workspace scoping), and SurveyLanguage repoint/dedupe.
  • Build: pnpm --filter @formbricks/database build — confirms the migration bundles correctly with @formbricks/i18n-utils inlined (so it resolves at runtime in both tsx and built node).
  • Dry-run: run the merge planner against a prod Language dump and assert the expected merges resolve with 0 orphans.

Note

Step 5 (contact language attribute) is the heaviest part on instances with large ContactAttribute tables. It is index-friendly and logs per-code progress, but because the framework wraps the whole migration in a single transaction, on very large tables this is a long-running transaction — worth scheduling for an off-peak window. Splitting the high-volume backfill out into a batched out-of-transaction script is a possible follow-up if needed.

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read How we Code at Formbricks
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits
  • Ran pnpm build
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from the epic onto my branch
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Formbricks Docs if changes were necessary

@pandeymangg pandeymangg requested a review from xernobyl June 26, 2026 06:55
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Adds a migration that canonicalizes language codes in language, survey-language, survey content, response, and contact attribute data. It also adds shared utility functions and type definitions for planning relabel and merge operations, plus Vitest coverage for canonicalization and move-planning behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is mostly placeholders and does not provide a real summary, issue reference, or testing details. Replace the template placeholders with a real PR summary, link the fixed issue, and add concrete test steps and checklist details.
âś… Passed checks (4 passed)
Check name Status Explanation
Title check âś… Passed The title is conventional, concise, and clearly describes the language-code data migration.
Docstring Coverage âś… Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check âś… Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check âś… Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts`:
- Around line 157-163: The survey-content migration loop in the code that calls
rewriteI18nKeys() is skipping unresolved codes whenever result.changed is false,
which underreports invalid values. Update the SURVEY_CONTENT_FIELDS processing
so unresolved codes from rewriteI18nKeys() are always added to
stats.unresolvedCodes before any early exit, while keeping
stats.i18nKeysRewritten incremented only when changes were made.

In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.test.ts`:
- Around line 193-201: The current test in planSurveyLanguageMoves only checks
the repoint/delete counts for duplicate absorbed links and misses the
flag-promotion branch. Extend this regression in utils.test.ts to assert the
behavior of planSurveyLanguageMoves for the duplicate absorbed-link case where
one link is repointed and the second is deduped, and verify that the surviving
moved row inherits the promoted default and enabled flags from the deleted
absorbed row.

In
`@packages/database/migration/20260625104904_canonicalize_language_codes/utils.ts`:
- Around line 204-229: `planSurveyLanguageMoves()` is not preserving the
survivor `default`/`enabled` state when the first absorbed link for a survey is
repointed, so later deduped rows can compute flags from the wrong baseline.
Update the repoint branch in `planSurveyLanguageMoves()` to record the first
repointed link in the survivor-state map (or otherwise carry its
`default`/`enabled` values forward) before any later delete-based merge logic
runs. Then keep the final `flagUpdatesBySurvey` comparison aligned with that
survivor state so `migration.ts` only emits flag updates after the survivor row
is known.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f1b446f-a960-4bcd-86de-6a3311d58782

📥 Commits

Reviewing files that changed from the base of the PR and between 8d63873 and a966e7c.

â›” Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
đź“’ Files selected for processing (5)
  • packages/database/migration/20260625104904_canonicalize_language_codes/migration.ts
  • packages/database/migration/20260625104904_canonicalize_language_codes/types.ts
  • packages/database/migration/20260625104904_canonicalize_language_codes/utils.test.ts
  • packages/database/migration/20260625104904_canonicalize_language_codes/utils.ts
  • packages/database/package.json

@sonarqubecloud

Copy link
Copy Markdown

@xernobyl xernobyl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants