What is Translation Memory?
Translation memory (TM) is a database that stores previously translated sentence pairs—called translation units (TUs)—consisting of source text and its corresponding translation. When translators work on new content, the TM system automatically searches for matches against previously translated segments, suggesting reusable translations to accelerate the translation process and maintain consistency. Translation memory is fundamental to modern localization workflows, enabling significant cost savings and quality improvements through intelligent reuse.
A translation unit typically consists of a source segment (e.g., "Welcome to our application"), target segment (e.g., "Bienvenido a nuestra aplicación"), metadata (translator name, date, client, project), and optional context information. When new content contains text similar to stored segments, the TM system calculates match percentages and presents candidates to translators, who can accept, modify, or reject suggestions based on context appropriateness.
How Translation Memory Works
Translation Unit Structure
Basic Translation Unit:
XML1<tu tuid="12345"> 2 <tuv xml:lang="en-US"> 3 <seg>Click the Submit button to continue.</seg> 4 </tuv> 5 <tuv xml:lang="es-ES"> 6 <seg>Haga clic en el botón Enviar para continuar.</seg> 7 </tuv> 8 <prop type="x-context">Button label on checkout form</prop> 9 <prop type="x-project">ecommerce-web</prop> 10 <prop type="x-translator">maria@translations.com</prop> 11 <prop type="x-date">2026-01-15</prop> 12</tu>
Components:
- Source segment (tuv): Original text in source language
- Target segment (tuv): Translated text in target language
- Metadata (prop): Context, project, translator, date, client
- Unique ID (tuid): Identifier for the translation unit
Match Types and Scoring
100% Match (Exact Match): Source text is identical to a previously translated segment, including punctuation, capitalization, and spacing. These matches can typically be auto-confirmed without translator review, though context verification is recommended.
Example:
- Stored: "Save your changes"
- New: "Save your changes"
- Match: 100%
Context Match (101% Match): Exact text match plus identical surrounding context (previous and next segments). Highest confidence match type, often auto-applied in CAT tools.
Example:
- Stored: Previous: "Edit profile", Current: "Save your changes", Next: "Cancel"
- New: Previous: "Edit profile", Current: "Save your changes", Next: "Cancel"
- Match: 101% (context match)
Fuzzy Match (75-99%): Partial match where source text is similar but not identical. Match percentage calculated using edit distance algorithms (Levenshtein distance, n-gram comparison). Translators review and adjust as needed.
Examples:
-
Stored: "Click the Submit button to continue"
-
New: "Click the Save button to continue"
-
Match: 85% (one word difference)
-
Stored: "Welcome to our application"
-
New: "Welcome to our new application"
-
Match: 92% (one word added)
No Match (0-74%): Insufficient similarity for useful suggestions. Translator creates new translation from scratch. Some systems don't show matches below configurable thresholds (typically 70-75%) to avoid noise.
Match Calculation Algorithms
Levenshtein Distance: Counts minimum single-character edits (insertions, deletions, substitutions) needed to transform one string into another.
JavaScript1function levenshteinDistance(str1, str2) { 2 const matrix = []; 3 4 for (let i = 0; i <= str2.length; i++) { 5 matrix[i] = [i]; 6 } 7 8 for (let j = 0; j <= str1.length; j++) { 9 matrix[0][j] = j; 10 } 11 12 for (let i = 1; i <= str2.length; i++) { 13 for (let j = 1; j <= str1.length; j++) { 14 if (str2.charAt(i - 1) === str1.charAt(j - 1)) { 15 matrix[i][j] = matrix[i - 1][j - 1]; 16 } else { 17 matrix[i][j] = Math.min( 18 matrix[i - 1][j - 1] + 1, // substitution 19 matrix[i][j - 1] + 1, // insertion 20 matrix[i - 1][j] + 1 // deletion 21 ); 22 } 23 } 24 } 25 26 return matrix[str2.length][str1.length]; 27} 28 29function matchPercentage(stored, newText) { 30 const distance = levenshteinDistance(stored, newText); 31 const maxLength = Math.max(stored.length, newText.length); 32 return Math.round((1 - distance / maxLength) * 100); 33}
N-gram Comparison: Breaks text into overlapping character sequences (typically 3-4 characters) and compares overlap between segments.
Example:
- Text: "Submit"
- 3-grams: "Sub", "ubm", "bmi", "mit"
Higher n-gram overlap indicates higher similarity.
Translation Memory vs Glossary
Translation Memory (TM)
Purpose: Store and reuse complete sentence translations.
Structure: Full sentence pairs with context and metadata.
Use Case: Accelerate translation of similar content, maintain consistency in phrasing and style.
Match Logic: Fuzzy matching based on similarity algorithms.
Example:
- Source: "Click here to view your order history"
- Translation: "Haga clic aquí para ver el historial de pedidos"
Glossary (Term Base)
Purpose: Define approved translations for specific terms and phrases.
Structure: Single words or short phrases with approved translations, context, and usage guidelines.
Use Case: Enforce consistent terminology across translators and projects.
Match Logic: Exact term matching (though some systems support morphological variants).
Example:
- Term: "checkout"
- Translation: "pago" (not "caja" or "finalizar compra")
- Context: "E-commerce purchasing process"
- Part of Speech: Noun
When to Use Each
| Scenario | Use TM | Use Glossary |
|---|---|---|
| Reuse full sentence translations | ✓ | |
| Enforce specific term translations | ✓ | |
| Maintain consistent phrasing | ✓ | |
| Define technical terminology | ✓ | |
| Speed up translation | ✓ | |
| Quality assurance checks | ✓ | ✓ |
| Onboard new translators | ✓ | ✓ |
Best Practice: Use both in combination. Glossaries define "what" terms mean, TM shows "how" they're used in sentences.
Building and Maintaining Translation Memory
Initial TM Population Strategies
1. Import Historical Translations If you have existing translations in files, databases, or previous systems, import them into TM format.
JavaScript1// Example: Convert JSON translations to TM format 2const sourceLang = require('./locales/en.json'); 3const targetLang = require('./locales/es.json'); 4 5const buildTM = (source, target, prefix = '') => { 6 const units = []; 7 8 Object.keys(source).forEach(key => { 9 const fullKey = prefix ? `${prefix}.${key}` : key; 10 11 if (typeof source[key] === 'object') { 12 units.push(...buildTM(source[key], target[key] || {}, fullKey)); 13 } else if (target[key]) { 14 units.push({ 15 id: fullKey, 16 source: source[key], 17 target: target[key], 18 context: fullKey, 19 date: new Date().toISOString() 20 }); 21 } 22 }); 23 24 return units; 25}; 26 27const tmUnits = buildTM(sourceLang, targetLang); 28// Export to TMX format or import to TM system
2. Align Bilingual Documents Use alignment tools to match source and target segments from parallel documents (e.g., English whitepaper and Spanish translation).
Tools:
- LF Aligner (free, open-source)
- WinAlign (SDL Trados)
- Memsource Align
3. Translate Sample Content For new projects without existing translations, translate representative sample content (100-500 segments) to seed the TM with high-quality units before scaling to full translation.
4. Leverage Public TM Resources Some open-source projects and organizations share TM databases:
- European Commission's Translation Memories (EU languages)
- OPUS parallel corpus (research resource)
- MyMemory public TM (limited quality)
Caution: Vet quality carefully. Public TMs may contain errors or inappropriate translations.
TM Maintenance Best Practices
1. Regular Cleaning Remove outdated translations for discontinued products, deprecated terminology, or content that no longer represents brand voice.
2. Quality Scoring Tag translation units with quality ratings. Prioritize high-quality matches from professional translators over machine translation or unverified suggestions.
3. Context Enrichment Add context metadata to translation units: screenshots, UI location, product area, intended audience. Rich context improves match relevance.
4. Duplicate Resolution Identify and merge duplicate translation units with identical source text but different translations. Review discrepancies and select the best version or create glossary entries for disambiguation.
5. Version Control Track TM changes over time. Maintain separate TM versions for different product releases or brand guidelines eras.
6. Translator Feedback Loop When translators reject TM suggestions or make significant edits, flag those units for review. Persistent rejections may indicate poor-quality TM entries.
TM Exchange Formats: TMX Standard
What is TMX?
Translation Memory eXchange (TMX) is an XML-based open standard for exchanging translation memory data between different CAT tools and TMS platforms. Created by the Localization Industry Standards Association (LISA), TMX enables vendor-neutral TM portability.
TMX Version: Most systems support TMX 1.4b or TMX 2.0.
TMX File Structure
XML1<?xml version="1.0" encoding="UTF-8"?> 2<tmx version="1.4"> 3 <header 4 creationtool="IntlPull" 5 creationtoolversion="1.0" 6 datatype="plaintext" 7 segtype="sentence" 8 adminlang="en-US" 9 srclang="en-US" 10 o-tmf="IntlPull TM Format" 11 creationdate="20260212T100000Z"> 12 </header> 13 <body> 14 <tu tuid="1" datatype="text"> 15 <tuv xml:lang="en-US"> 16 <seg>Welcome to our platform</seg> 17 </tuv> 18 <tuv xml:lang="es-ES"> 19 <seg>Bienvenido a nuestra plataforma</seg> 20 </tuv> 21 <tuv xml:lang="fr-FR"> 22 <seg>Bienvenue sur notre plateforme</seg> 23 </tuv> 24 </tu> 25 <tu tuid="2" datatype="text"> 26 <prop type="x-context">Dashboard header</prop> 27 <prop type="x-project">web-app-v2</prop> 28 <tuv xml:lang="en-US"> 29 <seg>Your account settings</seg> 30 </tuv> 31 <tuv xml:lang="es-ES"> 32 <seg>Configuración de tu cuenta</seg> 33 </tuv> 34 </tu> 35 </body> 36</tmx>
Exporting and Importing TMX
Export Use Cases:
- Migrate TM to different platform
- Share TM with external translation agencies
- Backup TM data
- Collaborate across tools (translator uses SDL Trados, client uses Phrase)
Import Use Cases:
- Onboard historical translations
- Consolidate TM from multiple projects
- Leverage external agency TM
- Merge TM from acquired companies
Best Practices:
- Clean TM before export (remove low-quality units)
- Include metadata (context, project) in custom properties
- Validate TMX syntax before import (malformed XML causes import failures)
- Test import with small subset before full TM migration
Leveraging TM for Cost Savings
Translation Pricing Models
Per-Word Pricing: Most common model. Different rates for match types.
Typical Rate Structure (example):
- 100% match: 10-20% of full rate
- 95-99% fuzzy: 30-50% of full rate
- 85-94% fuzzy: 50-70% of full rate
- 75-84% fuzzy: 70-90% of full rate
- No match (0-74%): 100% of full rate
Example Calculation:
- Base rate: $0.15 per word
- Project: 10,000 words
- Match breakdown:
- 100% match: 3,000 words @ $0.03 = $90
- 95-99% fuzzy: 2,000 words @ $0.07 = $140
- 85-94% fuzzy: 1,500 words @ $0.10 = $150
- No match: 3,500 words @ $0.15 = $525
- Total cost: $905 (39% savings vs. $1,500 without TM)
Maximizing TM Leverage
1. Consistent Source Content Standardize phrasing in source language. Use style guides and content templates to increase repetition.
Before (low TM leverage):
- "Click here to submit"
- "Press the submit button"
- "Submit your response"
After (high TM leverage):
- "Click Submit to continue" (consistent pattern)
2. Content Reuse Architecture Design content with reusable components. Shared headers, footers, error messages, and UI labels maximize TM hits.
3. Update Existing Translations When updating content, modify existing text rather than rewriting. Small changes yield high fuzzy matches.
Example:
- Original: "Get started with our platform"
- Update: "Get started with our new platform" (95% fuzzy match)
4. Translate High-Value Languages First Translate into major markets first (Spanish, French, German). Use those translations to populate TM before translating into smaller languages, where professional translators may be less available.
5. Segment Granularity Smaller segments increase match likelihood but may reduce translation quality (loss of context). Balance segment size based on content type.
Translation Memory Across Projects
Shared TM vs Project-Specific TM
Shared TM (Organization-Wide):
Pros:
- Maximize reuse across all projects
- Leverage translations from marketing, product, support content together
- Single source of truth for terminology usage
Cons:
- Domain confusion (e.g., "bank" in finance vs. river bank)
- Brand voice variations across product lines
- Quality dilution from low-quality projects
Project-Specific TM:
Pros:
- Consistent domain and context
- Isolated quality (low-quality project doesn't pollute other TMs)
- Easier maintenance and cleanup
Cons:
- Duplicate translation work across projects
- Missed reuse opportunities
- Higher overall translation costs
Hybrid Approach (Recommended)
Structure:
- Master TM: Organization-wide high-quality translations, curated and cleaned
- Project TM: Project-specific translations, automatically promoted to Master TM after quality review
- Client-Specific TM: For agencies managing multiple clients
Workflow:
- Translator uses Master TM + Project TM for suggestions
- New translations saved to Project TM
- Periodic review promotes high-quality units to Master TM
- Low-quality units remain isolated in Project TM
Cross-Language TM Leverage
Some TMS platforms support language-independent TM, where translations in Language A can suggest translations in Language B.
Example:
- English → Spanish: "Submit" → "Enviar"
- English → French: "Submit" → ?
- System suggests: Check Spanish translation "Enviar", machine-translate to French "Envoyer"
Useful for language families (Romance languages, Slavic languages) but verify quality carefully.
IntlPull's Translation Memory Features
Modern translation management platforms like IntlPull integrate translation memory directly into the translation workflow, eliminating the need for external CAT tools or manual TMX management.
Automatic TM Population
Every translation approved in IntlPull automatically adds to the translation memory. No manual export/import cycles. TM builds organically as you translate.
Intelligent Suggestions
When translators work on new keys, IntlPull suggests matches from TM with match percentages. Translators see context from original translation unit to verify appropriateness.
Example:
Translator works on key dashboard.settings.save
- TM Suggestion (95%): From key
profile.edit.save→ "Guardar cambios" - Context: "Save button on profile edit form"
- Translator: Accepts suggestion or modifies for dashboard context
Cross-Project TM
IntlPull's TM spans all projects in your organization. Translate marketing site, leverage those translations in product application. Configure project-specific TM override when needed.
Machine Translation + TM Hybrid
When no TM match exists, IntlPull can automatically provide machine translation suggestions as fallback. Translators see both TM matches (if partial) and MT suggestions, choosing the best starting point.
TMX Export for External Agencies
Export your IntlPull TM to TMX format when working with external translation agencies. Share project-specific TM to reduce costs and maintain consistency.
Quality Scoring
IntlPull tracks which translators created each TM unit and can prioritize suggestions from senior translators or approved reviewers over junior translators.
By integrating TM directly into the platform, IntlPull eliminates friction between translation work and TM management, ensuring every translation contributes to future efficiency.
Frequently Asked Questions
How much can I realistically save using translation memory?
Savings depend on content repetition rate. Software UI localization typically achieves 40-60% TM leverage (40-60% of words are matches), documentation 30-50%, marketing content 10-30%. At $0.15/word and 50% leverage with typical rate discounts, expect 20-35% cost reduction. First translation has no TM benefit, subsequent updates provide maximum savings.
Should I use machine translation or translation memory?
Use both. TM provides high-quality suggestions from human translators for content you've previously translated. MT generates suggestions for new content without TM matches. Best workflow: TM suggestions first, MT fallback for no-match segments, human post-editing for quality. IntlPull combines both automatically.
Can I use translation memory with JSON/YAML files instead of TMX?
TMX is interchange format for moving TM between systems. For day-to-day work, modern TMS platforms like IntlPull manage TM internally without requiring TMX manipulation. TMX only needed when migrating between platforms or sharing with external agencies. Work with your native file formats (JSON, YAML), TM operates behind the scenes.
How do I prevent low-quality translations from polluting my TM?
Implement review workflows where translations must be approved before entering TM. Tag TM units by quality level (machine translation, junior translator, senior translator, native reviewer). Configure TM to prioritize or only show high-quality matches. Regularly audit TM, removing or downgrading poor units. IntlPull's approval workflow ensures only reviewed translations become TM suggestions.
What's the difference between translation memory and machine translation?
Translation memory stores and retrieves human-created translations from previous work. Matches are exact or fuzzy based on similarity. Machine translation uses AI to generate new translations on demand. MT generates translations for any text, TM only suggests for similar previously-translated content. TM quality depends on source translation quality, MT quality depends on model training. Use TM first for consistency and accuracy, MT for coverage.
Can translation memory work across different file formats?
Yes, TM operates at segment level independent of file format. Translation of "Save changes" in JSON file can suggest match for same text in XLIFF, PO, or any format. TM systems extract text from various formats into universal segment representation, match against TM database, then insert translations back into target format.
How often should I clean and maintain my translation memory?
Perform light maintenance quarterly: review rejected suggestions, remove obsolete content, merge duplicates. Comprehensive cleanup annually: terminology updates, quality scoring, context enrichment. After major product rebrand or terminology changes, dedicated TM cleanup project ensures consistency. Use TM analytics to identify high-rejection-rate units as candidates for removal.
