Skip to navigation

XEmacs and umlauts — setting the character encoding to UTF-8

XEmacs didn’t display umlauts correctly. I guessed this has something to do with the file encoding (the file displayed correctly in a terminal window with nano or “xemacs -nw”). The grey horizontal bar that separates the main buffer from the minibuffer shows the encoding of the current buffer at the very left. The text in question, with umlauts, showed “raw”. It had to be set to UTF-8. I found this out by saying

env

in a terminal window (where the file displayed correctly). This gives a list of all environment variables that are set, and one of them was:

LANG=en_GB.UTF-8

I was able to fix this by adding the following three lines to the file ~/.xemacs/init.el (then you have to load that file with M-x load-file or restart XEmacs):

(require 'un-define)
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8)

I needed all three lines. The first was needed to even make UTF-8 available when setting the encoding for the current buffer (with C-x C-m f, something I found via the “Edit → Multilingual” menu). But with only the first line, the buffer still didn’t get detected correctly, and even when I set the encoding manually to UTF-8, it didn’t display the umlauts correctly.

The three lines don’t change the default encoding of new files. For me, this is “ISO8 — iso-2022-8”. Even if I manually set a new file to UTF-8 encoding, the next time I open it it opens in ISO8 unless it contains something like an umlaut.

It ‘works’ now, but I still have some questions:

  • Why doesn’t XEmacs use the default coding system that is used by my terminal window and set to UTF-8?
  • Should I try to change the default coding system for XEmacs, either to UTF-8 or to ISO-8859-1?
  • What’s the difference between ISO-2022-8 and ISO-8859-1? Does it matter?

One Response to “XEmacs and umlauts — setting the character encoding to UTF-8”

  1. Sima Says:

    A bit too late perhaps but here’s what I did to get UTF-8 as default and used when needed. In .emacs add
    (set-language-environment “UTF-8”)

    I think Emacs does some kind of lazy evaluation, it won’t “use” UTF-8 unless needed. Since many chars in Latin-1 (and other encodings) are the same in UTF-8, the file will show as Latin-1 but still be readable as UTF-8. UTF-8 will be used as soon as you input a special character not in the lazy char set. At least this is how it works for me.

    Source: http://www.emacswiki.org/cgi-bin/wiki/LanguageEnvironment