June 29: Intro to HTML
The web
From last time:
- 1990/91: www developed by Berners-Lee (CERN) for sharing hypertext documents among scientists
- For view from 1996-, search wayback machine
How do we get stuff on the web?
- We have to precisely specify how everything works; computers can't figure it out on their own.
- HTTP (HyperText Transfer Protocol): set of rules for how browsers and web servers communicate
- Ex: Your PC to the CS web server: "get /~cbk/4/index.html"
CS web server to your PC: [the page]
(There are also particular error responses that are possible.) - In class: demo of actual server response including headers (
curl -i) and web server logs.
Note that it's stateless (doesn't remember you from one request to the next). Cookies, anyone? - URL (Uniform Resource Locator): global identity of the page you want
- Ex:
http://www.cs.dartmouth.edu/~cbk/4/index.html
http:-- the protocol (how to obtain the document)www.cs.dartmouth.edu-- the hostname (which machine has it)~cbk/4/-- the path (where on that machine the document is)index.html-- the file name (if absent, often assumed to be index.html)
- The hostname is specified in a particular hierarchical naming structure
- Ex: www.cs.dartmouth.edu
- www -- the web server machine
- cs -- in the CS department
- dartmouth -- at Dartmouth
- edu -- which is an educational institution (compare .com, .gov, .org, etc.)
Markup
How do we display/use the stuff we get on the web?
- We have to precisely specify how everything works; computers can't figure it out on their own.
- Documents have structure
- Books: chapter, section, paragraph, sentence
- Recipes: name, ingredients, steps, cooking time, servings
- Library records: title, author, subject, publisher, date
- Need to explicitly represent all the information contained in a document, including its words and also its structure / meaning.
- Problem: we only have the same medium (the text document) to represent the extra info.
- Solution: markup language: use special extra words, called tags.
- In general, these tags go in pairs, around what they're
describing:
"I'm going to tell you the author's name"
J.R.R. Tolkien
"I'm finished telling you the author's name" - We can adopt a convention of using a single-word tag instead of
those sentences:
<author> J.R.R. Tolkien </author>
The plain tag goes before the text we're describing, and the slash-tag after it, like a pair of opening and closing parentheses. - SGML (standard generalized markup language) -- developed by committee of book publishers and librarians to solve the problem of how to tell the computer about the structure of a document.
- XML (extensible markup language) -- extensible markup language, designed to represent data in general.
- Tags can be nested:
<library> <book> <author> <lastname>Tolkien</lastname> <firstnames>J.R.R.</firstnames> </author> <title>The Hobbit</title> </book> <book> <author> <lastname>Tolkien</lastname> <firstnames>J.R.R.</firstnames> </author> <title>The Lord of the Rings</title> </book> </library>In the example, a library has a bunch of books, each of which has an author and title, and the author has both a lastname and one or more firstnames (for simplicity, I stuck them all together). Once we know what the tags mean, we can understand the provided data.
HTML documents
"Hypertext markup language"
- Basic structure of an HTML document:
<html> <head> <title>The title</title> </head> <body> The contents </body> </html> - As above, in general, have pairs of "open" (
<TAGNAME>) and "close" (</TAGNAME>) tags for the same name, like parentheses. - The tag name indicates how the tagged text is to be treated, e.g., the head contains general information about the document, including the title (in the title bar and in the bookmarks); the body contains the contents of the document (suitably marked up).
- Tag names are not case-sensitive.
<HTML>=<html>
Be consistently upper or lower. - Apart from markup tags, a web page is simply a regular old text file. You can edit these in whatever program you like (save as text). WARNING: if you save-as html in various word processors, "hidden" mark-up is added that is ugly, often unnecessary, and confusing to web browsers. Don't do it!
- In general, white space doesn't usually matter (the
<pre>tag is one exception), so use it to make your code neat and readable. (I used indentation in the above example to emphasize the nesting.) Nicely structured html helps people read it later (including you and your graders, who will count off it it is bad).
HW 0
A walk-through of the main steps for the zeroth homework.
Some HTML tags
- Paragraphs are surrounded by
<p>and</p>tags. Historically, the closing tag wasn't used (i.e., "<p>" indicated the end of a paragraph), so you might see some old documents without them. However, you should use them. - Within a paragraph, the text can be styled, e.g.:
- Emphasized text is surrounded by
<em>and</em>tags. - Strong text is surrounded by
<strong>and</strong>tags. - These can be combined by nesting: Strong & emphasized
- Emphasized text is surrounded by
- Headings, subheadings, subsubheadings, and so on are surrounded by
<h1>and</h1>tags,<h2>and</h2>tags, and so on. The smallest headings available are given by<h6>and</h6>tags.heading 1
...heading 6
- While most tags need both an "open" and a "close" tag, some tags
don't, e.g.,
<hr />-- horizontal rule, as below<br />-- force a line break,
as I just did
Notice that we put the slash after the tag for these "solo" tags (kind of a combined open and close). While that's not required (you could just have "<hr>"), it is considered good style and is required to meet some standards. - Character codes:
- Because they are used to begin and end tags, the symbols
< and > cannot be used directly in text. To
indicate them we use the codes
>and<. - Because the & is used to indicate the beginning of an
"escape" sequence when we want an & in the text we need an
code to indicate it:
&. - To make a space that really matters, use the character code
(non-breaking space). - Other character codes are used for special characters in foreign languages, and
special symbols like copyright (©, as
©). These codes are case sensitive. They all begin with an ampersand and end with a semicolon.
- Because they are used to begin and end tags, the symbols
< and > cannot be used directly in text. To
indicate them we use the codes
- You can put a comment in the HTML (text that is in the file but
isn't displayed), in a special type of solo tag whose name starts with
an exclamation point and two dashes, and ends with two dashes:
<!-- Whatever you want to say --> - Unknown tags are ignored (allowing for future expansion).