January 21, 20237 min readengineeringi18n

Software Localization and Internationalization

Software Localization and Internationalization. Multi-language application development. Insights from The Proper Motion Company.

When a SaaS product decides to expand beyond its home market, the first instinct is to “just translate the strings.” This underestimates the problem by an order of magnitude. True internationalization (i18n) and localization (l10n) touch every layer of your application – from database schemas and API responses to date formatting, text direction, currency handling, and even the size of your buttons. Companies that treat localization as an afterthought end up with hardcoded strings scattered across 400 files, date formats that confuse half their users, and a codebase that resists every new locale like a foreign body.

This guide covers the engineering foundations of internationalization, the practical mechanics of localization, and the architectural decisions that determine whether adding a new language takes two days or two months.

Internationalization Architecture: Laying the Foundation

Internationalization is the engineering work that makes localization possible. It should happen once, early in the project lifecycle, and it should be invisible to end users. The goal is a codebase where no user-facing string is hardcoded, no date or number is formatted without a locale parameter, and no layout assumes left-to-right text flow.

String externalization is the first and most impactful step. Every user-facing string – labels, error messages, tooltips, email subject lines, notification text, placeholder text, button labels – must be extracted from code and stored in locale-specific resource files. The two dominant formats are JSON (used by most JavaScript frameworks) and XLIFF/PO (used by enterprise translation management systems).

Organize resource files by feature or module, not as a single monolithic file. A structure like locales/en/dashboard.json, locales/en/settings.json, locales/ja/dashboard.json keeps files manageable and allows different teams or translators to work on different modules simultaneously.

Message formatting goes beyond simple string replacement. The ICU MessageFormat standard handles pluralization, gender agreement, and variable interpolation in a single, locale-aware syntax. Consider the English string “You have 3 new messages.” In English, the plural form depends only on the count: “1 message” vs. “2 messages.” In Polish, there are four plural forms. In Arabic, there are six. ICU MessageFormat handles all of these with a single template:

{count, plural,
  =0 {You have no new messages}
  one {You have # new message}
  other {You have # new messages}
}

Libraries that implement ICU MessageFormat include FormatJS (JavaScript/React), MessageFormat (Java), and ICU4C (C/C++). Invest in this from the start – retrofitting MessageFormat into a codebase that uses simple string concatenation is painful.

Locale-aware formatting must be applied consistently to dates, times, numbers, currencies, and units of measurement. Never format these manually with string concatenation. Use the Intl API in JavaScript (Intl.DateTimeFormat, Intl.NumberFormat), java.text.DateFormat in Java, or equivalent locale-aware formatters in your language.

A date displayed as “04/05/2023” means April 5th in the US, May 4th in the UK, and is ambiguous to everyone. Use Intl.DateTimeFormat with the user’s locale, and the API handles the rest: “April 5, 2023” for en-US, “5 April 2023” for en-GB, “2023年4月5日” for ja-JP.

Related: How to Build a Booking and Scheduling System

Database and API Design for Multi-Language Content

User-generated content and CMS-managed content require a different approach than UI strings. You cannot externalize a product description into a resource file – it needs to be stored in the database with translations managed by content editors or translators.

The two common database patterns for translatable content are:

Separate translation table: The main table stores the entity (product ID, SKU, price), and a translations table stores locale-specific fields (name, description) keyed by entity ID and locale. This is the more normalized approach and works well when only some fields are translatable.

products: id, sku, price, created_at
product_translations: product_id, locale, name, description

JSON column per translatable field: Each translatable field is a JSON object keyed by locale. This is simpler for reads (no join required) but harder to query and index.

products: id, sku, price, name_translations (JSON), description_translations (JSON)

The separate translation table is generally the better choice for applications with more than 3 locales, because it scales predictably and integrates cleanly with translation management workflows.

Your API should accept a locale parameter (via query string, header, or user preference) and return content in the requested locale. When a translation is missing, define a fallback strategy: return the content in the default locale, return null, or return the string with a visual indicator that it is untranslated. The fallback strategy should be configurable per deployment – a consumer-facing application might fall back to the default locale silently, while an internal CMS should flag untranslated content prominently.

Right-to-Left (RTL) Support and Bidirectional Text

Supporting Arabic, Hebrew, Farsi, and Urdu requires right-to-left layout support, which affects far more than text alignment. Navigation bars, sidebars, breadcrumbs, icons with directional meaning (arrows, progress indicators), and even shadow directions need to be mirrored.

The most efficient approach is to use CSS logical properties instead of physical properties throughout your stylesheet. Instead of margin-left: 16px, use margin-inline-start: 16px. Instead of padding-right: 8px, use padding-inline-end: 8px. Logical properties automatically flip when the document direction changes from LTR to RTL, eliminating the need for a separate RTL stylesheet.

Set the document direction with the dir attribute on the <html> element: <html dir="rtl" lang="ar">. This triggers CSS logical property flipping and enables the browser’s built-in bidirectional text algorithm.

Bidirectional (bidi) text – where LTR and RTL text appear in the same paragraph – introduces additional complexity. An Arabic sentence containing an English brand name or a URL will have text flowing in both directions. The Unicode Bidirectional Algorithm handles most cases automatically, but edge cases (numbers next to RTL text, nested directional runs) may require explicit Unicode direction marks (LRM, RLM) or the <bdo> HTML element.

Test RTL layouts by actually reading the interface with an Arabic or Hebrew speaker, not just visually checking that elements flipped correctly. Visual flipping can look correct while creating confusing reading order or obscuring important UI elements.

See also: The AI Technology Stack: Models, Frameworks, and Infrastructure Guide

Translation Workflow and Quality Management

The engineering foundation is only half the equation. The other half is getting high-quality translations produced, reviewed, and deployed efficiently.

Translation management systems (TMS) like Crowdin, Lokalise, Phrase (formerly PhraseApp), and Transifex integrate with your code repository and provide a web interface for translators to work in context. The typical workflow is:

Developers push new strings to the repository.
The TMS detects new or changed strings via webhook or CI integration.
Translators see the new strings in the TMS interface, along with context (screenshots, developer notes, glossary terms).
Translated strings are reviewed by a language lead.
Approved translations are automatically pushed back to the repository via pull request.

Context is critical for translation quality. The English word “Post” could mean “publish” (a verb), “a blog post” (a noun), or “mail” (another noun). Without context, translators guess wrong. Provide developer notes for ambiguous strings, screenshot references showing where the string appears in the UI, and character length limits for space-constrained elements.

Translation memory (TM) stores previously translated segments and suggests matches for new strings. A 100% match (identical string) is reused automatically. A fuzzy match (similar but not identical) is presented to the translator as a starting point. Over time, TM dramatically reduces translation cost and improves consistency.

Glossary management ensures that key terms are translated consistently. If “Dashboard” is translated as “Tableau de bord” in French, every occurrence should use the same term. Define a glossary of 50 to 200 key terms per product and enforce it through TMS glossary checks.

Testing Localized Applications

Localization bugs are uniquely sneaky. They hide behind languages that the development team does not speak, on screen sizes that the team does not test, and in cultural assumptions that the team does not question.

Pseudo-localization is the most effective technique for catching i18n bugs before real translations are available. It replaces each character in your strings with an accented equivalent (e.g., “Settings” becomes “[Seeettiiinnngss]”), adds extra characters to simulate text expansion (German text is typically 30% longer than English), and wraps strings in brackets to identify concatenation issues and untranslated hardcoded strings.

Many i18n libraries support pseudo-localization out of the box. Enable it as a locale option in your development environment and use it during visual QA reviews.

Text expansion testing verifies that your layouts accommodate longer translations. German, French, and Finnish text is typically 20% to 40% longer than English. If a button label that fits perfectly in English overflows its container in German, you will hear about it from German users. Test with the longest supported language (often German or Finnish) and design components with flexible widths wherever possible.

Locale-specific functional testing covers formatting differences. Verify that dates display correctly in each locale, that number formatting (decimal separators, thousands separators) matches local conventions, that currency symbols appear in the correct position, and that sorting/collation works for non-Latin alphabets. Automated tests that render a component in each supported locale and compare output against expected patterns catch these regressions early.

Cultural appropriateness review is the final and most nuanced check. Colors, icons, imagery, and metaphors carry different meanings across cultures. A thumbs-up icon is positive in Western cultures but offensive in parts of the Middle East. Red means “danger” in the West but “luck” in China. A calendar icon showing “17” is innocuous in most countries but associated with bad luck in Italy. These issues require human review by native speakers or cultural consultants.

Internationalization is a foundational architectural decision, not a feature you bolt on later. If you are building an application that will serve users across languages and regions – or planning to internationalize an existing product – contact our team. We architect i18n-ready systems from the ground up and have guided multi-language launches for products serving audiences in 20+ countries.

// Related posts

How to Build a Booking and Scheduling System

How to Build a Booking and Scheduling System. booking system development

Privacy-First Software Development as Competitive Advantage

Privacy-First Software Development as Competitive Advantage. privacy-first development

How to Plan and Execute a Software Migration

A practical guide to migrating from legacy systems to modern platforms without losing data.