Dave's Writing Guidelines

Over the years I have collected a few thoughts on the preparation of research papers, and I list them here for easy reference.

--David Kotz

English usage and style:

I usually mark paper drafts by hand, using red and blue ink; see this pdf for an example of those handwritten marks.  Here are some things that I commonly mark on student papers.  I mark them by writing the hashtag in the margin.  I tend to mark only the first one or two occurrences of any given mistake in any given paper.  I don't try to explain all of them fully here; see some of the books below for specific advice about English usage.  Day's books are particularly good for learning the conventions used in scientific writing.
#AC (define acronyms)
Always define acronyms at their first use.
#BW (B&W - black and white)
Ensure diagrams and graphs are visible when printed on B&W printers — many readers (and reviewers!) still print the papers they read, and many do not routinely use a color printer.  Check out this interactive color exploration tool and note the checkboxes for "colorblind friendly" and "print friendly" modes.
#CA (compound adjectives)
Hyphenate compound words that are used as an adjective.  For example, in "open-book exam", the phrase "open book" is used as an adjective for the noun "exam."  These hyphens are used much like parentheses; "open-book exam" is read like "((open book) exam), whereas "open book exam" is read "(open (book exam))", not the likely meaning.  Exception: no hyphen after a word ending in ly.
#CN (citations are not nouns)
Citation references are not nouns; they are parenthetical remarks.  Thus, it is not correct to write "In [13] they show that P=NP." You should write "A seminal paper proved P=NP [13]."  Or, "Jones and Smith show that P=NP [13]."  The reference can go at the end of the sentence (my preference) or at the end of the relevant phrase (sometimes better when multiple refs are cited in the same sentence).  See also #ET.
#CO (commas)
Commas can be tricky. Here is an interesting article about some subtle comma issues.  See also #OX.
#CQ (curly quotes)
In LaTeX, quotation marks are written in pairs using backticks (to open quote) and forward ticks (to close quote), `like this' for single quotes and ``like this'' for double quotes. The double-quote character (") should not be used in LaTeX source, as it always formats as a right-curly double quote.
#CX (capitalize cross-references)
Capitalize the word "Figure" when referring to a specific figure, such as "see Figure 4."  Same with Section, Table, Equation, and so forth.  Do not abbreviate these words.
#D - dashes, em-dash and en-dash
The em-dash is a long horizontal line — like this — used to set off a remark from the rest of the sentence.  There are two common approaches to formatting an em-dash — with and without spaces.  In LaTeX, I prefer to use an en-dash, which is slightly shorter than an em-dash, and also to place a non-breaking space prior to the en-dash, and a breaking space after the en-dash, like this~-- note the tilde, the double dash, and the space.  (The other method would be like this---note the triple dash, and no space before or after the triple dash.)
#EG (e.g. and etc.)
The abbreviation "e.g." replaces "for example," and belongs at the beginning of a list of examples. The abbreviation "etc." replaces "and so forth", and belongs at the end of a list of examples. They are never used together, because each implies that the list is just a set of examples, not a complete list of all possibilities. To use both on the same list would thus be redundant. Its meaning is different than "i.e."; see #IE. It is always followed by a comma; see also #LA.
#ET (et al.)
The Latin phrase "et al." is often used to abbreviate a long author list.  Note that the first word et is a word and the second al. is an abbreviation, so only the latter has a period.  Format this phrase in LaTeX as "et~al."  The tie (tilde) prevents a line break between the words, which I feel looks awkward.  See also #LA, #TI.
#FM (footnote mark)
Place a footnote mark after punctuation, not before.  Like this.2  Not like this3.
#FR (floating references)
A floating figure or table should always appear on the same page, or a later page, than its first reference in the text.  LaTeX will arrange this placement properly as long as you put the {figure} or {table} environment after the first \ref to that float.  My practice is to put the environment immediately after the end of the paragraph containing the first \ref.  The forward-reference is ok and is resolved by the second pass of LaTeX.
#H (however)
"However" should (usually) not begin a sentence: rewrite "However, I found that the red ball had been missing for weeks" as "I found, however, that the red ball had been missing for weeks".  In that usage, note "however" is surrounded by commas.  It is ok for it to be at the end of a sentence, however.  Above, I say "usually" because this is not a strict rule.
#HH (text between headings)
Always include some text between headings, even if only one or two sentences; the goal is to provide an introduction to the following subsection, or a transition from the prior section to this section and the following subsection.
#IE (i.e.)
The abbreviation "i.e." replaces "that is," indicating an explanation of the phrase before it. It is always followed by a comma; see also #LA. Its meaning is different than "e.g."; see #EG.
#IN (Internet vs. internet)
When writing about general interconnected computer networks, call them 'internets' (not capitalized).  When writing about the specific public internet that is based on the IP protocol, call it the 'Internet'.  The Internet is an instance of all internets.  Unfortunately it is common, though technically incorrect, to equate the Internet and the World-Wide Web (which is also capitalized, please note).  The Web (or WWW) is a subset of the Internet.
#IT (its vs. it's)
The word "its" is often confused with the contraction "it's". The word "its" is the possessive form of "it", much like "his", "hers", and "theirs" are possessive forms of "he", "she", and "they", respectively; notice that all end in 's' but none include an apostrophe. On the other hand, "it's" is a contraction for "it is", and thus (see #NC) should never appear in scientific writing.
#LA (Latin abbreviations "i.e.", "e.g.", "etc.", "vs.")
The abbreviations "i.e.", "e.g.", "etc.", "vs.", are indeed abbreviations and thus should have periods as shown.  Of those, "i.e.", "e.g.", should always be followed by a comma, as should "etc." when in the middle of the sentence.  (Why? because they replace "that is", "for example", and "and so forth", which are always delimited by commas.)  At the end of a sentence I prefer to use "and so forth" rather than "etc.."  See also #ET.
#NC (no contractions)
Do not use contractions in formal writing.
#OX (Oxford comma)
In a comma-separated list, use a comma after every item except the last.  For example: "Alice, Bob, and Charlie are frequent collaborators in security research."  (You may often see lists skip the final comma, i.e., "Alice, Bob and Charlie", but please avoid that approach.) This final comma is the so-called serial comma (or "Oxford comma"); although controversial, it avoids any ambiguity, so I advocate its use in every list.  Clarity is critical in scientific writing.  See also #CO.
#P (passive)
Avoid the passive sentence structure.  It obscures the subject of the sentence, and leads to ambiguity.  For example, "The prototype was built" is a passive phrase whereas "We built a prototype" is an active phrase.  The active structure is almost always shorter, and clearer.
#PN (page numbers)
Please include page numbers in your document.
#PR (preposition)
Do not end a clause or sentence with a preposition (with, for, to, from, under, on, in, and so forth).
#RI (reduce ink)
Reduce "ink" in tables and graphs.  I often see tables with a line all the way around the table, between every column, and between every row... ugh, it looks like a spreadsheet.  All that "ink" is unnecessary and distracting.  Read Tufte's book (below); it will transform the way you think about presenting data.
#SC (Sentence case)
Use "Sentence case" for titles and headings, rather than "Title Case"; that is, capitalize only the first word of a title or heading.  This is my personal preference, but I can live with Title Case if a publisher's style (or co-authors) insist on it.  Regardless of the choice, use a consistent approach for all headings.
#SP (spelling)
Check your spelling!  The spell checker can also catch many typos.
#T (tense)
Scientific writing uses tense in specific ways.
  • It is common in scientific research papers to use the first-person plural, that is, to write "We developed a method..." rather than "I developed a method...", even on a single-authored paper.  The rationale I've heard is that the plural recognizes that science is normally a collaborative process.
  • When referring to your own experimental results, use the past tense.  Thus, "We ran ten trials and the average execution was 10.4 seconds." Past tense to say you did the work, and past tense to describe the result.
  • When writing about related work, describe their results in the present tense.  Thus, "Jones et al. conducted a survey and found that most Dartmouth students wear green clothing."  Past tense to say they did the work, but present tense to describe their result.
  • When referring to other parts of your paper, use present tense.  That is, do not say, "This paper will discuss...", say, "This paper discusses...".  Similarly, say, "The argument above proves that..." rather than "The argument above proved that...".  Why?  Because, despite the fact that your reader is reading through the paper, over time, the paper stands complete, in the present.  The argument not only proved, but still proves....
#TH (this)
The word "this" should almost always be followed by a noun: instead of saying "this is red", you should be more specific with "this ball is red."  If you leave it out, your reader may mentally insert a different noun than you had in mind... things that are not ambiguous to you can be ambiguous to your reader.
#TI (tie)
Use a a 'tie' (a non-breaking space) right before any number or citation, so it won't appear at the start of a new line, like this
[13].  Or like this: Figure
4. In LaTeX, use a tilde for a non-breaking space: "blah~\cite{jones:PNP}." or "Figure~\ref{f:pretty}.
#UN (units)
Units: The convention in computer science seems to be the following: when measuring storage, mega and kilo refer to powers of two; thus a megabyte is 220 bytes and a kilobyte is 210 bytes.  When measuring network bandwidth, mega and kilo refer to powers of ten; thus one megabit-per-second is 106 bits per second, and one kilobit-per-second is 103 bits per second.  When abbreviating, k=103 but K=210, and m=106 but M=220.  Furthermore, b=bits and B=bytes.  Thus, 10 MB is 10 times 220 bytes, but 10 mbps is ten million bits per second. 
#UND (underline)
Do not underline words and phrases.  For emphasis, foreign words, etc., use italics.
If you want to mention a URL, do not place it inline, in the text.  Put it in a footnote, or a reference at the end of the paper.  In-line URLs can produce awkwardly long lines (or broken URLs), and anyway, few people actually want to read a URL.  With most conferences restricting the number of pages for a paper's body, but not the references, citing a URL (rather than placing it in a footnote) saves critical space in the paper body. It also allows you to provide further details, such as the title of the web page, and the date you visited it.  Also: avoid URLs that refer to CGI scripts or include search parameters, as these tend to have a short lifetime.
#V (verbosity)
Avoid verbosity, e.g.,:
  • "in order to..." becomes "to..."
  • "at this point in time" becomes "at this time"
  • "more and more common..." becomes "more common..."
  • "a number of" becomes "several"
  • "utilizes" becomes "uses"
#VY (very)
It is rarely useful to use the word "very".  How much hotter than "hot" is "very hot"?  This story may be apocryphal, but Mark Twain once said that he would just replace "very" with "damn" everywhere, and then the editor would surely take them all out.
#WF (Wi-Fi)
The term "Wi-Fi" is always hyphenated and capitalized. It is a trademark of the Wi-Fi Alliance and they chose that specific spelling. "WiFi" is incorrect.
#WN (whether or not)
It is rarely appropriate to say "whether or not"; usually you should just say "whether".  If you do use "whether or not", don't spread the words across the sentence.
#WT (which vs. that)
Be careful how you use "which" and "that".  "Which" nearly always follows a comma, because it is used to add information, whereas "that" is used to qualify:
  • The ball, which is red, fell down the hole.
  • The ball that is red fell down the hole.
In the first sentence, there is only one ball involved, and we mention almost as an aside that it is a red ball.  In the second sentence, there are presumably many balls involved, but it is the red ball that fell down the hole.  The following sentence is ungrammatical:
  • The ball which is red fell down the hole.
#/ (slash)
Avoid using a slash when you mean "and" or "or".  If you write a slash, the reader may have a different interpretation than you.
#@ (\@ to end a sentence)
In LaTeX, when ending a sentence with a capital letter and a period, you need to tell LaTeX that it is the end of a sentence and not an abbreviation in mid-sentence.  For example, my name is David F. Kotz.  The "F" in that sentence is a capital letter followed by a period, and LaTeX rightly supposes that it is not the end of a sentence.  LaTeX puts a little more space between sentences than between words, so it is good to get it right.  As another example, I started a project called CRAWDAD.  That sentence ends with a capital letter and a period; in LaTeX, it should be coded like this: CRAWDAD\@.

Style tips for bibliographies:

Although you must follow the publisher's requested style, I recommend use of my corresponding customized bibtex style, and proofreading your paper's printed "References" section for alignment with the following principles:

Structure of a typical paper:

Your paper draft should have a title, authors, date, revision number, and abstract.
A typical structure is

Books and references about writing:

See also my list of favorite papers/books about writing and research.

New, and still-relevant references:

Older references - may be harder to find.