A guide to writing mathematics

1. Writing mathematics

3. Whitespace

2. Typesetting text

4. Typesetting mathematics

3 Whitespace

Whitespace refers to all blank space in your compiled TeX document – spaces between words, line spacing, margins, and so on. Correct whitespace usage can mean the difference between a neat, easy to follow document, and a jumbled heap of garbage.

3.1 Use paragraphs consciously

A paragraph is a collection sentences that logically belong together. A correct and conscious use of paragraphs will help the reader in sorting and digesting your text: Excessively long paragraphs are tiresome to read, whereas too many short paragraphs lead to an incoherent collection of sentences.

To start a new paragraph in TeX, insert one or more blank lines in your TeX file:

This is the first paragraph. This is the first paragraph. This is the first paragraph. This is the first paragraph.

This is the second paragraph.

(An equivalent way of starting a new paragraph is the \par command. However, inserting blank lines makes for a more readable TeX file, so you should prefer blank lines over \par. There is no difference in the compiled document.)

TeX does several things when ending a paragraph: It fills the last line with glue, inserts a slightly bigger vertical space after that line, and indents the start of the new paragraph. When occuring near the end of a page, TeX is also more likely to break the page between two paragraphs than between two lines in the same paragraph.

There are various other ways of starting a new paragraph. A \section, \subsection etc. always starts a new paragraph, regardless of how many blank lines are present. Likewise, a \begin{theorem} ... \end{theorem} block will always end the current paragraph and start a new paragraph after the theorem. It is still a good idea to insert one or more blank lines in your TeX file before and after these constructs, in order to make your TeX file more readable.

A related construct is the macro \\, which in text mode is shorthand for \newline:

This is the first paragraph. \\ This is the same paragraph, only on a new line. \newline Still the same paragraph.

\newline merely fills the current line with glue and starts a new line, without any of the other TeX magic associated with starting a new paragraph.

The \\ macro is shorthand for many useful things, like breaking a line in a multiline math display, or starting a new row in a table. However, you should rarely have use for \\ in text mode, and never for ending a paragraph!

3.2 Respect your margins

The default margins of a compiled TeX document might seem exceedingly wide, but there are are good reasons for them to be wide. Don’t let formulas spill into the margin, and don’t fall for the tempation of modifying the margin widths.

3.2.1 Overfull margins

“Overfull hbox” is perhaps the most common (and most annoying) warning message when compiling TeX documents. Although you can safely ignore these warnings at first, you should address them properly before finalizing the document.

3.2.2 Margin widths

Many who are new to TeX (and some experienced TeXers, too) find the margins in TeX documents too wide. Although margin widths are easily adjusted, you should resist the temptation to do so. Research has shown that text is most easily digested when each line contains between 60 and 75 characters, and that is what you get with TeX's default margin widths. (As a rule of thumb, a line should have room for two to three lowercase English alphabets: abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.)

The real issue isn’t that the default line width is too narrow – it's that A4 paper (or US letter size, for that matter) is too large! Some solutions to this mismatch are:

3.3 Insert space to increase readability

TeX provides a large number of units that are used for spacing and scaling objects on a page. Some of these, such as cm and in (equalling one centimeter and one inch, respectively, on printed paper) are immutable, whereas others, such as em and ex (corresponding roughly to the width of an “m” and the height of an “x”, respectively) are supplied by the font currently in use.

There are several scenarios where you need to insert space manually in order to improve readability, and in these cases you should almost always use macros that are defined in terms of context-dependent units such as em and ex. These include:

\,
A thin space, defined as ⅙ em. Use it to separate the integrand from the measure in an integral:
\[ \int_0^1 \sin(x)x\,dx \]
\, is synonymous to \thinspace in text mode.
\!
A negative thin space. Can be used to decrease the space between successive integral signs, which is often too large:
\[ \int_0^1\!\int_0^1 \sin(x)y\,dx\,dy \]
\␣
A control space. (The ␣ symbol stands for the blank space obtained by pressing the space bar.) Produces a space of the same size as the space between two words in a sentence. Use it for inserting normal spaces between abbreviated words in text mode, or for separating items in a list in math mode:
Am.\ Math.\ Soc.\ has several great journals.
The numbers $a,\ b,\ c$ and $d$ are positive.
\;
A thick space, defined as 5⁄18 em, is roughly equivalent to a text mode space. Can be used before a condition in text:
Consider the sequence $x_n=n^2\;(n\in\{1,2,3,\ldots\})$
\; only works in math mode; its text mode equivalent is \thickspace.
\quad
A quad, defined as 1 em. This space is only used in displays. Used to separate conjunctions in the same display, or an expression and a verbal description:
\[ f(x) \to 0 \quad \text{and} \quad g(x) \to 1 \quad \text{as }x \to 0 \]
\qquad
A double quad, defined as 2 em. This space is only used in displays. Used for many of the same purposes as \quad.
We define the two functions \[ f(x) = 1+x, \qquad g(x) = e^x. \]

A more thorough description of math mode spacing conventions can be found here.

3.4 Keep an eye on TeX’s automatic spacing

To increase readability, TeX produces spaces of subtly different sizes between different types of objects. Consider the following example:

\[
-1-2+\sum_{k=3}^5 -k =-15
\]

The space in front of 2 is larger than those in front of 1, k and 15. This makes it clear that the minus between 1 and 2 is a binary operator – it is an operation depending on two numbers – while the minuses in front of 1, k and 15 are unary operators – they take a single argument (and negates it).

The system that TeX employs to insert spacing is actually quite simple. When deciphering a formula, TeX first reads in the string of characters and lumps them into atoms. The atoms in the above formula are: –, 1, –, 2, +, Σ, –, k, =, –, 1, and 5. (The sub- and superscripts of Σ are part of that atom; they are themselves collections of the atoms k, =, 3 and 5.)

TeX then classifies each atom into one of thirteen types. The seven most important types are:
OrdAn ordinary atom, such as k, 1 or the unary operator –
OpA large operator, such as \sum or \lim in display mode
BinA binary operator, such as the binary operators + and –
RelA relational operator, such as =, < or >
OpenAn opening delimiter, such as (, [ or {
CloseA closing delimiter, such as ), ] or }
PunctPunctuation, such as ,

You can force TeX to interpret an atom as a specific type by using the commands \mathord, \mathop, \mathbin, etc.

After classifying atoms into types, TeX inserts spaces between atoms based on their types. For instance, TeX inserts no space between two Ords, a medium space between an Op and an Ord, and a thick space between a Rel and an Op.

While this system works well most of the time, TeX sometimes needs a bit of help in deciding the type of an atom. Take the following three examples:

\begin{align*}
\text{Bad:} &&
|-x| && x-<x,y>y && \sum_k\Big(-\frac{k}{2}+1\Big) \\
\text{Good:} &&
|{-x}| && x-\mathopen<x,y\mathclose>y && \sum_k\Bigl(-\frac{k}{2}+1\Bigr)
\end{align*}

In the first example, TeX treats the | symbol as an Ord, and reads the formula as “| minus x times |”. To ensure that the “–” is treated as a unary operator, we can enclose it in curly braces, forcing TeX to evaluate “–x” before it evaluates the surrounding atoms.

In the second example, the < and > symbols are used to denote an inner product. The formula is intended to mean “x minus the inner product of x and y times y”, but since < and > are relational operators (Rels), TeX reads the formula as “x minus is less than x, and y is greater than y”. The solution is to instruct TeX to treat < and > as opening and closing delimiters using the \mathopen and \mathclose commands. (As a sidenote, the correct way to denote inner products is by using the \langle and \rangle commands. These are automatically interpreted as Open and Close atoms.)

In the third example, the opening paranthesis \Big( is interpreted as an Ord and not an Open atom, with the consequence that the “–” is treated as a binary operator instead of a unary operator. The solution is to use \Bigl and \Bigr, which are synonyms for \mathopen\Big and \mathclose\Big, respectively. (See §4.3 Sizing and spacing delimiters for more on the \Big command.)

For more information on atoms and spacing, see pp. 157 and 170 of The TeXBook.

3.5 Learn when to use a tie

The tie character ~ inserts a space without ending the current word – a non-breaking space:

Two words. One~word.

Here, TeX treats “Two” and “words” as separate words, while “One word” is treated as a single word (with a blank space as its fourth character). Although the two spaces look the same, there are subtle differences.

3.5.1 Spaces between words and sentences

TeX inserts a larger space after a sentence (marked by a period) than between words in the same sentence. This can cause problems with abbreviations:

Pay attention to Thm. 5. It is due to D. E. Knuth.

Although the above is logically two sentences, TeX treats the period after “Thm” as the end of a sentence, resulting in slightly larger spaces between “Thm.” and “5.”, as well as between “5.” and “It”. (TeX interprets a single capital letter followed by a period, such as “D.” and “E.”, as an initial, and does not end the sentence after the period.)

The above should be typeset as

Pay attention to Thm.~5. It is due to D. E. Knuth.

(The above could also be typeset using the \ command, which inserts a regular space: Pay attention to Thm.\ 5. The difference is that TeX treats Thm.~5 as a single word, but Thm\ 5 as two words; see below.)

As an aside, there are times when a sentence actually does end with a single capital letter. To instruct LaTeX to end the sentence, insert \@ before the period:

He has blood type A\@. He's not really my type.

3.5.2 Keeping words on the same line

When breaking a line, TeX strives to break it between two words. This means TeX might put a line (or page) break between words which semantically belong together: Examples of such words include references, names of persons, and some common abbreviations. To force TeX to treat these as a single word, insert a tie instead of a space:

\begin{theorem}\label{thm:knuth}
...
\end{theorem}
Theorem~\ref{thm:knuth} is due to Donald~Knuth. For a reference, cf.~The~\TeX{}book.

3.6 Aligning displays

When typesetting a long calculation that spans several lines, it is vital to align each line horizontally properly. The horizontal alignment should emphasize the logical structure of your expression, not obscure it. When deciding where to align a multi-line display, follow this simple rule of thumb:

Align content with the same type of content in the line above.

If a line begins with a relational operator (=, <, >, etc.), align it with a relational operator on the line above. If the contents of a paranthesis spills over to the next line, then align the start of the line to the opening paranthesis. If an integrand spills over to the next line, then align it with the integral symbol. And so on.

\begin{align*}
f(x)+g(x)+h(x) &= \sin x + 2\cos x + 3\sin^2 x + 4\cos^4 x \\
&\quad + 5 \sin^5 x + 6\cos^6 x \\
&= \int_0^x \Big(p(y)\cos y + q(y)\sin^3 y e^y - r(y)\sin y \\
&\qquad\qquad - s(y) - e^{y^2}\Big)\,dy.
\end{align*}

In this example I have inserted one em space at the start of the second line and four em spaces on the fourth line (see §3.3 Insert space to increase readability for more on spaces). The end result is that content on successive lines which is semantically related is aligned with one another.

The pedantic reader might have noticed that “sin” and “+” on the first and second line, and “p” and “-” on the third and fourth lines, are not perfectly aligned. And indeed, the amount of spacing I inserted (one and four em spaces, respectively) was chosen rather arbitrarily.

There are at least three solutions to this issue. The first (somewhat suboptimal) solution is to insert different amounts of spacing until subsequent lines are perfectly matched. The second is to use the \hphantom{...} command, which inserts a horizontal space of the same width as its argument:

\begin{align*}
f(x)+g(x)+h(x) &= \big|\sin x + 2\cos x + 3\sin^2 x + 4\cos^4 x \\
&\hphantom{{}={}} \big|+ 5 \sin^5 x + 6\cos^6 x \\
&= \int_0^x \Big(\big|p(y)\cos y + q(y)\sin^3 y e^y - r(y)\sin y \\
&\hphantom{{}= \int_0^x \Big(} \big|- s(y) - e^{y^2}\Big)\,dy.
\end{align*}

(I have inserted vertical bars using \big| to clearly see the aligned positions.) The command \hphantom{{}={}} on the second line inserts a horizontal space whose length equals the = symbol. The curly braces {} on either side of = tell TeX that = should be interpreted as a relational operator, and not, say, a unary operator (see §3.4 Keep an eye on TeX’s automatic spacing and §4.3 Sizing and spacing delimiters); an alternative solution would be \mathrel{\hphantom{=}}.

The third solution to the problem is to use the aligned embedded environment (see also §4.4.5 Embedded multi-line expressions):

\[
f(x)+g(x)+h(x) = \int_0^x \Big(
\begin{aligned}[t]
&\big| p(y)\cos y + q(y)\sin^3 y e^y - r(y)\sin y \\
&\big| - s(y) - e^{y^2}\Big)\,dy.
\end{aligned}
\]

The optional argument to aligned ([t] in the above example) specifies the vertical alignment: either [t]op, [b]ottom or [c]entred.