|
Here's how I implemented a translation management system for a
static website,
using GNU
gettext. For the impatient, I've distilled it to 11
instructions at the end.
Goal
This system allows block-by-block translation (string-by-string),
which is better than page-by-page because:
- Changes to non-translated parts will be applied to all
translations automatically (formatting, tags, images, maybe dates,
names, links, etc.).
- By storing the text blocks of all pages together, repeated blocks
will only have to be translated once (menu text, copyright notices,
headings, etc.).
- You won't get lost when the original changes while the translation
is still in progress.
- When you change a paragraph in the original, it's easier to see
what parts of the translations need to be updated.
For such a system, the abstract steps are:
Somehow mark each translatable text block in your webpage. The
non-translatable parts will become a shared frame.
Extract the blocks into a database. Translate.
Find or write some software to merge the blocks back into the
frame to remake the original webpage - but with the option of taking
the text blocks from either the English database or one of the
translated versions of the database.
Gettext seemed like an obvious possibility, and everything's working
perfectly now, but it took me eight hours. The difficulty was that
the existing documentation is all geared toward using gettext for
computer programs, not for websites or documents. That's when I
realised that I must document what I did:
What I did
I started by minimally turning my webpage into a computer program.
This involved five steps:
Write a tiny program that prints some text (a string) into a
file.
Copy the webpage into the program in place of the string.
Insert some standard bits of code required by gettext.
Break the string into smaller strings, separating translatable
from non-translatable
Mark the translatable strings with gettexts' tag (the format of
the tags depend on which programming language you use but it's
usually something involving an underscore _ ).
Gettext works with lots of programming languages, so take your pick
from the examples that come with the package. On my computer, these
are in this
folder: /usr/share/doc/gettext-doc/examples/
The choice of language isn't important. The code will be dead
simple.
Here's my original index.html:
<html>
<head>
<title>Cow</title>
</head>
<body>
<p>See also: <a href="http://fsfe.org/">FSFE</a></p>
</body>
</html>
Of the supported programming languages, I
choose Scheme (a
dialect of Lisp). At first glance, the code below looks complex,
but you'll only have to modify the first and third chunks. The
first chunk defines three variables which should be
self-explanatory. All the webpage text is in the third chunk. It's
broken up into blocks and I've put gettext tags for
Scheme (_ ) around the translatable blocks. Here
it is, generate-index.scm:
#!/usr/bin/guile -s
!#
(define output-filename "index.html")
(define project-name "ciarans-website")
(define build-directory "/home/ciaran/website-build/")
(use-modules (ice-9 format))
(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))
(textdomain project-name)
(bindtextdomain project-name build-directory)
(define _ gettext)
(define page-text (string-append
"<html><head>\n<title>"
(_ "Cow")
"</title>\n</head>\n<body>\n<p>"
(_ "See also: ")
"<a href=\"http://fsfe.org/\">"
(_ "FSFE")
"</a></p>\n"
"</body></html>\n\n"))
(define the-file (open-file output-filename "w"))
(display page-text the-file)
Three of the eight strings are marked as translatable. The other
five are part of the shared frame that will be the same no matter
what language version of the page is being generated.
Remember to replace any quote marks in your HTML with
backslash-quote (\"), and to add a few line breaks
(\n) to make the output readable. Those are the quote
and the newline sequences for Scheme. They're the same in a few
other languages, but they're different in others.
Before you continue, you must set the "build-directory"
variable to the directory where generate-index.scm is.
If you don't, everything will seem to work but your program will
never access the translated strings.
That done, you extract the translatable strings with these two
commands:
And then you can create a file (a "po" file) for French
translations with this command:
One part of the gettext manual says that
"msginit" is optional - that you can do it
manually instead, but this didn't work for me at all. I spent two
hours diagnosing that problem. Use msginit.
This creates fr.po which you can edit with any text
editor. There will be a line at the top like this:
"Content-Type: text/plain; charset=UTF-8\n"
If your charset is "ASCII", you should probably change it
to UTF-8. If your charset is something else and you get error
messages from other gettext tools (such as msgmerge) about invalid
characters, then changing charset to UTF-8 might also be the answer.
There'll also be a field for content-transfer-encoding. The manual
says that should always be "8bit".
Emacs is
particularly good for editing po files because it has a special
editing mode for them.
And then you have to convert your po file into the special mo format
and put it in the subdirectory where gettext expects it to be with
these two commands:
Make the Scheme file executable, and that's it!
ciaran@hide:~/tests/simple-page$ LANGUAGE=fr ./generate-index.scm; cat index.html
<html><head>
<title>Vache</title>
</head>
<body>
<p>Voir aussi : <a href="http://fsfe.org/">La FSFE</a></p>
</body></html>
ciaran@hide:~/tests/simple-page$ LANGUAGE=en ./generate-index.scm; cat index.html
<html><head>
<title>Cow</title>
</head>
<body>
<p>See also: <a href="http://fsfe.org/">FSFE</a></p>
</body></html>
ciaran@hide:~/tests/simple-page$
Ok, so there's your proof-of-concept. Next, I have to convert my
site to this system and maintain it (using msgmerge).
I'll try to keep notes to publish here.
The instructions
Make an empty file generate-index.scm
Copy my generate-index.scm (above) into your file
Adjust the build-directory (3rd defined variable) in generate-index.scm to
point to the directory where your generate-index.scm is
$ xgettext --language=scheme -d ciarans-website -k_ generate-index.scm
$ mv ciarans-website.po ciarans-website.pot
$ msginit --locale=fr
edit fr.po to add translations of the three text strings
$ mkdir -p fr/LC_MESSAGES
$ msgfmt --output-file=fr/LC_MESSAGES/ciarans-website.mo fr.po
$ chmod +x generate-index.scm
$ LANGUAGE=fr ./generate-index.scm; cat index.html
--
Ciaran O'Riordan, (RSS)
Support free software: Join FSFE's
Fellowship
Read More : |