Understanding and setting up the screen driver

Unicode is everywhere

Screen Font Maps

In recent kernels (at least since 2.0.x), the screen driver is based on 16-bit unicode (UCS2) encoding, which means that every console-font loaded should be defined using a unicode Screen Font Map (SFM for short), which tells, for each character in the font, the list of UCS2 characters it will render. [1]

SFM Fallback tables

Starting with release 1997.11.13 of the Linux Console Tools, consolechars(8) now understands SFM fallback tables. Before that, SFM's should contain at the same time the Unicode of the characters it was primarily meant to render, as well as any approximations the user would like to. These fallback tables allow to only put the primary mappings in the SFM provided with the font-file, and to separatelykeep a list telling ``if no glyph for that character is available in the current font, then try to display it with the glyph for this one, or else the one for that one, or ...''. This permits to keep in one only place all possible fallbacks, and everyone will be able to choose which fallback tables (s)he wants. Have a look at data/consoletrans/*.fallback for examples.

A fallback-table file is made of fallback entries, each entry being on its own line. Empty lines, and lines beginning with the # comment character are ignored.

A fallback entry is a series of 2 or more UCS2 codes. The first one is the character for which we want a glyph; the following ones are those whose glyph we want to use when no glyph designed specially for our character is available. The order of the codes defines a priority order (own glyph if available, then second char's, then the third's, etc.)

If a SFM was to be loaded, fallback mappings are added to this map before it is loaded. If there was not (ie. a font without SFM was loaded, and no --sfm option was given to consolechars, or the --force-no-sfm option was given), then the current SFM is requested from the kernel, the fallback mappings are added, and the resulting SFM is loaded back into the kernel.

Note that each fallback entry is checked against the original SFM, not against the SFM we get by adding former fallback entries to the original SFM (the one read from a file, or given by the kernel); this applies even to entries in different files, and thus the order of -k options has no effect. If you want some entries to be influenced by previous ones, you will have to use different fallback files, and to load them with several consecutive invocations of consolechars -k.

The unicode screen-mode

There are basically 2 screen-modes (byte mode and UTF mode). The simpler to explain is the UTF mode, in which the bytes received from the application (ie. written to the console screen) are interpreted as UTF8 sequences, which are converted in the the Section called What is Unicode, and then looked-up in the SFM to determine the glyphs used to display each character.

Switching to and from UTF mode is done by sending to the screen the escape sequences <ESC>%G and <ESC>%@ respectively. You may use the unicode_start(1) and unicode_stop(1) scripts instead, as they also change the keyboard mode, and let you optionally change the screen-font.

Use vt-is-UTF8(1) to find out whether active VT is in UTF mode.

The byte screen-mode

The byte mode is a bit more complicated, as it uses an additional map to transform the byte-characters sent by the application into UCS2 characters, which are then treated as told above. This map I call the Application Charset Map (ACM), because it defines the encoding the application uses, but it used to be called a ``screen map'', or ``console map'' (this comes from the time where the screen driver didn't use Unicode, and there was only one Map down there).

Although there is only one ACM active at a given time, there are 4 of them at any time in the kernel; 3 of them are built-in and never change, and they define the IBM codepage 437 (the i386's default, and thus the kernel's default even on other archs), the DEC VT100 charset, and the ISO latin1 charset; the 4th is user-definable, and defaults on boot to the ``straight to font'' mapping, decribed below under ``Special UCS2 codes''.

The consolechars(1) command can be used to change the ACM, as well as the font and its associated SFM.

Charset slots

The Linux Console Driver has 2 slots for charsets, labeled G0 and G1. Each of these slots contains a reference to one of the 4 kernel ACMs, 3 of which are predefined to provide the cp437, iso01, and vt100 graphics charsets. The 4th one is user-definable; this is the one you can set with consolechars --acm and get with consolechars --old-acm.

Versions of the Linux Console Tools prior to 1998.08.11, as well as all versions of kbd at least until 0.96a, were always assuming you wanted to use the G0 slot, pointing to the user-defined ACM. You can now use the charset utility to tune your charset slots.

You will note that, although each VT has its own slot settings, there is only one user-defined ACM for use by all the VTs. That is, whereas you can have tty1 using G0=cp437 and G1=vt100, at the same time as tty2 using G0=iso01 and G1=iso02 (user-defined), you cannot have at the same time tty1 using iso02 and tty2 using iso03. This is a limitation of the linux kernel.

Note that you can emulate such a setting using the filterm utility, with your console in UTF8-mode, by telling filterm to translate screen output on-the-fly to UTF8.

You'll find filterm in the konwert package, by Marcin Kowalczyk, which is available from his WWW site.

Special UCS2 codes

There are special UCS2 values you should care about, but the present list is probably not exhaustive:

About the old 8-bit ``screen maps''

There was a time where the kernel didn't know anything about Unicode. In this ancient time, Application Charset Maps did not exist. Instead we had Font-charset maps (what they called ``screen maps''), and just mapped the application's characters into font positions. The file format used for these 8bit FCM's is still supported for backward compatibility, but should not be used any more.

The FCM mechanism didn't know about unicode, so the FCM had to depend not only on the charset, but also on the current font. Now, as each VT chooses its own ACM (from the 4 ones in the kernel at a given time), and as the console-font is common to all VT's, we can use a charset even if the font can't display all of its characters; it will then display the replacement character (U+FFFD).

Note that in Linux 2.2.x using framebuffer devices, you can even load a font per VT.

See also

psfaddtable(1), psfgettable(1), psfstriptable(1), showcfont(1).

Notes

[1]

SFM's were formerly called ``Unicode Map'', or ``unimap'' for short, but this term should be dropped, as now what they called ``screen maps'' uses Unicode as well: it probably confused many many people