DEV Community

Cover image for The 3 i18n mistakes every open-source LMS makes
Vadym Arnaut
Vadym Arnaut

Posted on

The 3 i18n mistakes every open-source LMS makes

TL;DR. Every open-source LMS treats internationalization as one problem. It's three. UI strings, user-generated content, and canonical artifacts each need a different mechanism. Most codebases collapse them into one — that's the bug.

We've been building an open-source Bible school LMS for about 5 months. It runs in Russian and English (Ukrainian coming), with auto-translated user content and canonical-text preservation for scripture quotes. Building this forced me to look at how Moodle, Open edX, Canvas LMS, and Chamilo handle the same problem.

The pattern is the same in all of them — and was the same in ours when we started.


Mistake #1: User-generated content treated like UI strings

Every LMS has solid gettext-style infrastructure for UI strings. "Sign in", "Course catalog", "Submit assignment" — these live in .po files (Moodle), Transifex (Open edX), i18n-js (Canvas), YAML (Rails-based). A translator translates the file once. Done.

User-generated content is a different problem entirely. When a teacher authors a course in Russian, the title — "Введение в Послание к Римлянам" — is a row in courses.title. An English-speaking student opens the catalog and sees Cyrillic. The UI is translated. The content isn't.

UI strings User-generated content
Examples "Sign in", "Submit" Course title, lesson body, quiz question
Source Fixed catalog of values Unbounded user input
Tool gettext / Transifex / YAML Runtime translation (Google, DeepL, Gemini)
When Build / release time Runtime (lazy or eager)
Storage Locale file Separate cache table
Common bug None major Treated like UI strings → only one language ever shown

Moodle's workaround is the multilang filter — you wrap content in <span lang="ru">…</span><span lang="en">…</span> and a filter shows the matching one. This is a 2008 solution. It puts the entire burden on the teacher: they must author every piece of content twice, in every language. Most don't, and the platform falls back to "show whatever the teacher wrote first."

The shape of the right answer is a separate translation cache:

CREATE TABLE content_translations (
  entity_type  TEXT NOT NULL,    -- 'course', 'lesson', 'quiz_question'
  entity_id    UUID NOT NULL,
  field        TEXT NOT NULL,    -- 'title', 'description', 'body'
  locale       TEXT NOT NULL,    -- 'ru', 'en', 'uk', 'es'
  content      TEXT NOT NULL,
  source       TEXT NOT NULL,    -- 'human' | 'machine' | 'canonical'
  cached_at    TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (entity_type, entity_id, field, locale)
);
Enter fullscreen mode Exit fullscreen mode

Teacher authors in one language. A translation worker fills in the others lazily (first request) or eagerly (on publish). The course-detail endpoint joins on (entity_type, entity_id, field, viewer_locale).

The architectural decision: content is not a UI string and cannot live in the same system. If your LMS uses gettext for "Submit assignment" and the same mechanism for course titles, that's the bug.


Mistake #2: No accommodation for length variance

English is dense. Russian runs ~25–30% longer for the same meaning. German is comparable. Finnish is worse. Arabic is shorter — and right-to-left, its own category.

Most LMS UIs are designed in English. Buttons that fit "Save" don't fit "Сохранить". Nav tabs that fit "Courses" wrap to two lines for "Курсы и обучение". Mobile breaks first.

Part of the fix is CSS:

/* Reserve space against the longer-language baseline */
.action-button {
  min-width: 8rem;
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
}

/* Tabular content: don't let translation reflow the grid */
.lesson-list-cell {
  min-height: 4.5rem;  /* fits 2-line Russian titles without shift */
}
Enter fullscreen mode Exit fullscreen mode

But the real fix is testing every screen in your longest-content language, not just English. Most teams hire designers who only see the English mock and don't realize their "Continue" button is broken in Russian until a user reports it.

What helped us: a Storybook locale switcher defaulting to Russian (not English), and a Playwright snapshot suite that screenshots both locales. The first commit that breaks Russian layout is caught in CI, not by a user.


Mistake #3: Canonical content forced through translation

This is the bug that motivated our entire content-translation rewrite.

A teacher in Russian writes: "Послание к Римлянам 8:28 говорит, что все содействует ко благу". The course gets auto-translated to English. Naively you send the whole string to Google Translate or DeepL, and you get back: "Romans 8:28 says that everything works together for good." That's almost a real Bible verse — but it isn't quoting any actual translation. It's a translation of a translation.

You don't want that. You want the actual KJV (or NIV, or ESV) text of Romans 8:28 spliced in.

Same problem exists everywhere there's canonical content:

  • Programming courses with code samples (don't translate the variables!)
  • Math curricula with formulas
  • Legal courses citing statutes
  • Medical courses citing clinical-trial registries
  • Literature courses with original-language quotes

Most LMSes don't separate "translatable prose" from "canonical artifact" — so when they auto-translate, the canonical content gets mangled or invented.

Our pattern: parse the source for canonical references, replace each with a placeholder token, translate the surrounding prose, then substitute the canonical text back from a separate lookup.

def translate_with_canonical_preservation(
    text: str, source_lang: str, target_lang: str
) -> str:
    # 1. Find canonical references in either language
    refs = extract_bible_refs(text, lang=source_lang)
    # [{"raw": "Послание к Римлянам 8:28", "book": "ROM", "chapter": 8, "verses": [28]}]

    # 2. Replace each with a unique placeholder
    placeholders = {}
    for i, ref in enumerate(refs):
        token = f"⟦CANON_{i}"
        placeholders[token] = ref
        text = text.replace(ref["raw"], token, 1)

    # 3. Translate the token-bearing string
    translated = translate(text, source_lang, target_lang)

    # 4. Substitute the canonical text back, looked up in the target translation
    for token, ref in placeholders.items():
        canonical_text = lookup_canonical(ref, target_lang)  # KJV for en, Synodal for ru
        translated = translated.replace(token, canonical_text)

    return translated
Enter fullscreen mode Exit fullscreen mode

How extract_bible_refs handles the cross-language book-name matrix

Detection is a regex over a normalized book-name dictionary. Each canonical book has an entry like:

BOOKS = {
    "ROM": {
        "en": ["Romans", "Rom", "Rom."],
        "ru": ["Послание к Римлянам", "Римлянам", "Рим", "Рим."],
        "uk": ["Послання до Римлян", "Римлян", "Рим"],
    },
    # ... 65 more books
}
Enter fullscreen mode Exit fullscreen mode

The regex is built per request as (book_alias_1|book_alias_2|...)\s+(\d+):(\d+)(?:[-–](\d+))? so it accepts: Romans 8:28, Послание к Римлянам 8:28, Рим 8:28, Рим. 8:28, Romans 8:28-29, Romans 8:28a.

Sundry edge cases: chapter-only refs (Romans 8 — whole-chapter), letter suffixes (8:28a — first half of verse), em-dash vs hyphen, non-breaking spaces. Each lives in unit tests.

lookup_canonical pulls from a canonical text table keyed by (book, chapter, verse_start, verse_end, translation). Cache the final translated string keyed by (entity_id, field, locale) per Mistake #1.

This is ~400 lines we wouldn't have written if any of the LMSes we looked at had solved this for us.


What I want to hear back

These are the three patterns I keep seeing. They're not the only ones (cache invalidation on edits, RTL layout, Slavic plural forms — separate posts), but they're the ones every general-purpose LMS skips.

If you've shipped i18n in an LMS, education platform, or any content product:

  • Do you separate UI strings from content? What's your storage shape?
  • How do you handle length variance? Do you test the long-language layout in CI?
  • Do you have canonical content that mustn't be translated? Code samples, equations, citations? How are you handling it?

Curious where teams have landed. The patterns above are our current best, not our last word.


The project that drove all of this is open source:

GitHub logo ArVaViT / equip

Free, open-source LMS for Bible schools, ministries, and nonprofit educational programs. React + FastAPI + Supabase.

Equip logo

Equip

A free, open-source learning management system built for Bible schools church ministries, and nonprofit educational programs

MIT License Backend CI Frontend CI Good first issues

Live demo · Roadmap · Contributing · Changelog


Why this project?

Hundreds of small Bible schools, home churches, and missionary training programs around the world still manage courses on paper, WhatsApp, or spreadsheets. Commercial LMS platforms are expensive, overkill, or require technical expertise that volunteer-run organizations simply don't have.

Equip is designed to change that:

  • Free forever — MIT-licensed, no paywalls, no "premium" tiers.
  • Simple to deploy — one-click Vercel deploy with a free Supabase database. No Docker, no servers to manage.
  • Built for small scale — optimized for 20-100 students, not enterprise pricing models.
  • Contributor-friendly — clear docs, conventional commits, issue templates, and a welcoming community.

Features

Area What you get
Course authoring Courses, modules, chapters, rich content blocks (TipTap editor with images, YouTube, callouts, audio)
Assessments Multiple-choice, true/false, short-answer, and essay

Top comments (0)