Modern terminal emulators are the tip of an iceberg of legacy which may form the longest-lived legacy system in active use. As anyone who has worked with legacy systems before might predict, the accumulated hindsight of an ancient legacy system makes programming terminals a pretty miserable experience today 😅 How far back does it go, and how have the decisions of our ancestors affected the system as it appears today?
Prepare yourself.
→ See also: House of Leaves - Mark Z. Danielewski
On Unix, the terminal emulator manages a resource called a pty, or "pseudoterminal". Check out "man pty" for the comprehensive details. These pseudoterminals enable communication between a master and slave process (the terminal emulator and the programs running in it, respectively). Two APIs emerged for dealing with ptys on Unix: System V (or "UNIX 98") and BSD. Only the former is recommended for modern applications.
These Unix flavors came up with pseudoterminal to provide a pseudo- version of the real thing: a terminal. Yes, the terminal emulator, as the name might suggest, is but an emulator of a very real object, in the same sense as a GameBoy emulator is emulating the behavior of a real system that money can buy. Now obsolete, terminals were dedicated devices which provided a screen and connected to a minicomputer, mainframe, or modem, usually via a serial cable, and provided text-based input and output capabilities much like what we see in our terminal emulators today. Your terminal emulator is usually emulating, at a bare minimum, the DEC VT100 terminal, which looked like this:
→ Picture of a DEC VT100. Photo credit: Jason Scott
Your Unix system today still includes a state machine which aims to reproduce the behavior of this device, in concert with your terminal emulator. The TTY subsystem provides for the configuration of traits like baud rate, parity configuration, and other signalling concerns. Yes, your terminal emulator has a baud rate. You can find out what it is like so:
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <termios.h>
#include <unistd.h>
int main(void) {
int fd = open("/dev/tty", O_RDWR);
assert(fd >= 0);
struct termios t;
int r = tcgetattr(fd, &t);
assert(r >= 0);
int rate = 0;
switch (cfgetospeed(&t)) {
case B0: rate = 0; break;
case B50: rate = 50; break;
case B75: rate = 75; break;
case B110: rate = 110; break;
case B134: rate = 134; break;
case B150: rate = 150; break;
case B200: rate = 200; break;
case B300: rate = 300; break;
case B600: rate = 600; break;
case B1200: rate = 1200; break;
case B1800: rate = 1800; break;
case B2400: rate = 2400; break;
case B4800: rate = 4800; break;
case B9600: rate = 9600; break;
case B19200: rate = 19200; break;
case B38400: rate = 38400; break;
default: rate = -1; break;
}
printf("baud rate: %d\n", rate);
close(fd);
}
There's a lot of other state that's not really being used, but is nevertheless part of the system today. On Linux, "man ioctl_tty" will satisfy the curious reader, complete with an explanation of the differences between SVr4, UnixWare, Solaris, DG/UX, AIX, HP-UX, and Tru64. Pop over to "man termios" for more, like the configurations specific to terminals connected to modems, the readline-like behavior which is built in to the Linux TTY subsystem, or the fun considerations every new development like O_NONBLOCK had to make for this system. Fun fact: early versions of io_uring caused kernel lock-ups when writing to a TTY device.
SVR4 brings us to 1988, so our legacy counter is at 33 years. Can we push it further? We could follow the Unix lineage back, or go straight to the serial communications these configuration options are manipulating - and we will go there - but I'd like to take a detour. All of the complexity thus far serves to manage the connection between the running program and the terminal displaying it. However, the data carried over this connection also features in-band signalling to control terminal features, in the form of ANSI escape codes. You have probably at least seen a snippet like this:
printf '\e[31mRed text\e[m\n'
The \e
character here is the "escape" character in ASCII, which has codepoint 1B (or 033). This signals that a sequence of characters follows which signals the terminal to change its behavior. A variety of commands are available, but the [ character following ESC indicates a Control Sequence Introducer, which is the most common case. These sequences can, as in this example, control the text color, but can also move the cursor around the screen to print text in a non-linear fashion. These are the building blocks of "TUI", or Text User Interface, applications such as vi. This standard was established in 1976 as ECMA-48, adding another 12 years to our dive through history (now 45 years).
This standard is itself derivative of earlier works. The ANSI standard was established to address the growing capabilities of so-called video terminals, but these terminals themselves were recent innovations in their time, and included backwards compatibility with earlier technology: the teletype, also called... wait for it... a TTY. I could just tell you when these were introduced and wrap this article up now, but I'd like to draw a direct connection from these machines to their living legacy in your terminal emulator today.
The latest models of teletypes were essentially typewriters connected to a spool of paper which it could autonomously print type onto based on incoming electrical signals. This enforced certain limitations, one in particular: once printed, the ink could not be erased. However, interactive programs were still made under these conditions, including the famous ed, which is the standard Unix editor.
Unix was written on one such device, using a similar editor. Also note that these interfaces saw the first computer games - that is, distinct from "video" games - which were primarily text-based adventure games, the most famous of which is aptly called "Adventure".
Programs like these embraced their medium with as much enthusiasm as later programs did for later mediums. The medium even offered some advantages which video terminals lost, such as the ability to tear off the page you were working on and send it to your mate, or to write notes directly on the terminal output with a pencil. Like the ANSI escape codes that came later, the electronic signals the teletypes used also provided in-band signalling functionality to allow programs to perform complex output operations and fully utilize the capabilities of these devices, primarily through a standard called ASCII.
Like the control sequences of video terminals, ASCII provides its own set of control sequences, some of which you have probably used. The "Line Feed" character, or LF (you may know it as \n
), literally fed a new line into the teletype by engaging the spool motor. Carriage Return, or CR (or \r
), literally returns the teletype's mechanical type carriage to its starting position on the line. Other characters in ASCII are assigned to signalling purposes, such as EOT, or end of transmission, which is assigned to character 4. Your terminal today supports all of these same features, and many programs take advantage of them - the most common modern use is perhaps the use of \r
to easily move the cursor back to the start of a line.
#!/bin/sh
i=0
while [ $i -lt 10 ]
do
printf '\rProgress: %d%%' "$((i*10))"
i=$((i+1))
sleep 1
done
printf '\rProgress: 100%%\n'
printf 'Done.\n'
The ASCII character set was designed to facilitate computer communications, and was established in 1963 for this purpose: 58 years ago. This essentially brings us to the dawn of the computer age, just 4 years after the development of the MOSFET transistor, which is the single most important invention for the explosion of computers into general use.
Early computers prior to this point did not establish much in the way of legacy standards that are still a part of modern computing, so our story ends here. The programmers of this time lived in a world where communications were governed by telephones, and, before that, the telegraph.
Hang on, though. These ASCII computers repurposed teletypes, an existing technology, so, naturally, their communication model was based on whatever those devices understood. Can we go back even further? The primary protocol for electronic signalling at this time was standardized as the International Telegraph Alphabet No. 2 (ITA2). If we look inside, we find all of the English letters, and also... null, carriage return, line feed, and bell.
All of these characters were added to ASCII for backwards compatibility with ITA2, which was introduced in 1924, and are implemented by your terminal emulator today. 97 years of backwards compatibility. But, look carefully: why the "2" in ITA2?
ITA2 is derived from the so-called "Murray Code", developed by Donald Murray in 1901. It introduced the first control characters, carriage return, line feed, and bell, which "dings" an audible bell upon receipt by the remote teleprinter, and might ding your terminal emulator if you run printf '\a'
. Our legacy now extends through an entire century, and beyond living memory. The Murray Code is not ITA1.
ITA1 is the name which was ultimately given to the Baudot code, originally patented by Émile Baudot in 1872. It defines fewer control sequences: just "delete", which lives on in your terminal as ASCII character DEL (0x7F). However, it does provide us with one additional important link to the present: Baudot's name gave us the "baud" rate that our little C program printed out earlier. One continuous connection from past to present: 149 years of legacy code.
P.S. A fun fact I learned which does not have any discernable connection to modern terminal emulators is that the fax machine was invented in 1846 and were first made commercially available in 1865: 11 years before the invention of the telephone.