This blog post is part of the series “Learning web development” – which teaches people who have never programmed how to create web apps with JavaScript.
To download the projects, go to the GitHub repository learning-web-dev-code
and follow the instructions there.
I’m interested in feedback! If there is something you don’t understand, please write a comment at the end of this page.
In this chapter, we learn how to create web pages via HTML.
You will not need this knowledge very often but it is helpful to have a rough idea of what hexadecimal numbers are because they regularly come up in web development.
0123456789
0123456789ABCDEF
We can even convert numbers from hexadecimal to decimal via JavaScript: If a number literal starts with Ox
(x
stands for heXadecimal) then it is interpreted as a hexadecimal number:
> 0x9
9
> 0xA
10
> 0xF
15
> 0x10
16
> 0xFF
255
> 0x100
256
Why are hexadecimal numbers convenient in computing? Because the numerical range of four bits is 16: 2 × 2 × 2 × 2. One bit has a range of two – it can represent two numbers.
In the context of file systems, directory is another word for folder.
A file path specifies the location of a file in a file system. It consists of a series of zero or more names of parent directories followed by the name of a file or directory. These names are separated by:
/
)\
)These are examples of Unix file paths:
/home/robin/site/index.html
/js/
dir/file.txt
image.jpg
In order to make it clear that a given name refers to a directory, I often end it with a slash.
https:...
) work? Web addresses look like this:
https://exploringjs.com/js/
Such addresses are called URLs – Uniform Resource Locators. They identify resources (web pages etc.) on servers. “Server” is just another word for “computer we can reach via the internet”. Simple URLs have the following parts:
https:
exploringjs.com
/js/
This means: The resource is accessible via the protocol HTTPS, on the server whose name is exploringjs.com
where it has the path /js/
.
What is a protocol? The protocol specifies how to talk to to that server. The protocols used by the World-Wide Web are HTTP (Hypertext Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure). The latter is an encrypted version of the former which ensures that no one can listen in when two parties (often a browser and a server) communicate.
file:
URLs Web pages can also be located locally, on a drive of our computer. In that case, their URLs use the file:
protocol – e.g.:
/home/robin/site/index.html
file:///home/robin/site/index.html
Note that file:
urls have no hosts (which would be mentioned after the first two slashes and before the third slash). That makes sense because they are not located on servers.
Files on our drives are called local files. Files on servers are called remote files.
Inside a web page, we don’t have to use full URLs to refer to other resources (via links etc.): We can use relative references. They are basically file paths that are resolved against the URL of the web page. As an example, consider a web page with the following URL:
http://example.com/book/chap/index.html
The following table shows a few relative references and the URLs they are equivalent to:
Relative reference | Equivalent URL |
---|---|
image.jpg |
http://example.com/book/chap/image.jpg |
img/image.jpg |
http://example.com/book/chap/img/image.jpg |
../toc.html |
http://example.com/book/toc.html |
/home.html |
http://example.com/home.html |
Two dots (..
) mean: “go up one directory”.
A URL reference is either a URL or a relative reference.
A URL reference can optionally be followed by a URL fragment: a hash symbol (#
) and an identifier (think name). A URL fragment refers to a part of a resource – e.g., a section of a web page. Examples:
http://example.com/#footer
hello.html#intro
A web server usually makes a directory on its drive accessible to the web. It gets requests for the resources it manages. For example, if we enter a URL into a web browser, it makes a GET
request to the server named in that URL. That is a request to download a resource at a specific path. The server uses the path to identify a file in its directory and delivers it to the browser. In this scenario, all resources are files.
We even get a file if we point the browser to a directory because most servers then deliver the file index.html
in that directory (if that file exists). Therefore, the following two URLs are often equivalent:
https://example.com/
https://example.com/index.html
A web page is an HTML file: A text file whose content has a special syntax (rules for how to write the content). HTML encodes structured content that has headings, paragraphs, bullet lists, etc. and web browsers are apps that display such content.
HTML is an abbreviation of Hypertext Markup Language. Hypertext means structured text with links. What a markup language is, is explained next.
Before computers, authors used typewriters to write books in plain text (without any formatting). There were often rules for how to indicate what’s a heading, a paragraph, etc. Such rules can be considered a markup language.
With HTML, there are two language levels. Consider the following HTML content:
<h1>This is a level 1 heading</h1>
<p>This is a paragraph</p>
<p>
This is <strong>bold text</strong>.
</p>
On one hand, there is the markup level where characters inside <
and >
specify the structure of the text: <h1>
, </h1>
, etc. are part of that level.
On the other hand, there is the text level with plain text in English that is inside the structure – e.g., “This is a level 1 heading”.
HTML code is sometimes called source code: It is the original code that is then processed by a browser before it is displayed.
Let’s revisit the following HTML content:
<h1>This is a level 1 heading</h1>
This kind of structure is called an HTML element. Its start is marked via the start tag <h1>
; its end via the end tag </h1>
.
The most common kinds of HTML elements are:
h1
is a normal element.<br>
which represents a line break inside a paragraph.
<br/>
– which indicates that it is both a start tag and an end tag. I have a slight preference against doing that when I write HTML by hand – it feels truer to the nature of the language.HTML also distinguishes between block elements and inline elements:
<h1>
and <p>
.<strong>
.Whitespace refers to mostly invisible characters that are displayed as spaces (gaps) in text – e.g.: spaces, tabs (created via the Tab key), line endings (created via the Return key), etc.
HTML generally “collapses” each sequence of one or more whitespace characters into a single space (with exceptions). One notable exception is that whitespace at the beginning or end of a block element is usually ignored. Therefore, we can write paragraphs like this:
<p>
This is a paragraph
with multiple
lines.
</p>
This is the text we see on screen – it does not have any leading or trailing whitespace:
This is a paragraph with multiple lines.
Attributes let us provide more information for an HTML element – e.g. this is how a hyperlink is written:
<a href="https://example.com">Examples</a>
The text we see on screen is “Examples”. If we click on that text, the browser jumps to the website at example.com
.
In this case, the attribute consists of:
href
https://example.com
I recommend to always put values in quotes, but they can be omitted in some cases.
There are also attributes that only have keys. They function as on/off switches: If the attribute key is present, some feature is switched on. If it isn’t, the feature is switched off.
A character reference is HTML syntax for displaying a specific character on screen. There are three kinds of character references. Each of the following character references represents the less-than character (<
):
<
(“lt” is an abbreviation of “less than”)<
<
Why is that useful?
If a character (such as <
) has a special meaning in HTML, typing it won’t display it on screen. We need to escape it – encode it in a manner so that it isn’t special anymore. Character references provide us with the means to do that.
Character references help with displaying invisible characters such as wider spaces because we can see explicitly what kind of character we are dealing with. That’s especially important in code editors where most characters have the same width.
In the past, before the Unicode standard for representing plain text characters became well-supported, using character references for rarer symbols such as © was more important. Now I prefer typing in those symbols directly.
Character | Named CR | Hex. numeric CR |
---|---|---|
& | & (ampersand) |
& |
< | < (less than) |
< |
> | > (greater than) |
> |
" | " (quote) |
" |
' | ' (apostrophe) |
' |
Note that &
is now also a special character in HTML – which is why we need a character reference for it. Named character references are only available for a limited amount of characters while numeric character references can represent all Unicode characters (which all have numbers called code points).
Now we can write about HTML in HTML:
This is a tag: <body><br>
This is a character entity: &lt;
Which is displayed as:
This is a tag: <body>
This is a character entity: <
Width | Named CR | Hex. numeric CR |
---|---|---|
[ ] Space | — |   |
[ ] Non-breaking space | |
  |
[ ] Narrow non-breaking space | — |   |
[ ] Hair space |   |
  |
[ ] Thin space |   |
  |
[ ] Punctuation space |   |
  |
[ ] Figure space |   |
  |
[ ] En space |   |
  |
[ ] Em space |   |
  |
[] (compare: no space) |
Comments:
Most of these spaces are only relevant if you are doing very precise typesetting. In normal documents, these are most useful:
19 kg
Figure 3
<h3>1.5.2. Second subsection</h3>
In English, the prefix “meta” means “transcending”. It often refers to a level above a normal level or above another level. These are two examples:
Linguistics: “language” and “metalanguage”. Consider the following sentence: “The German word ‘Zeitgeist’ has no simple English translation.” In that sentence, German is the language that we talk about and English is the metalanguage. English is a level above German.
Computing: A file on a drive contains data. Metadata is data about such a file – e.g., the date and time when it was created.
In the next section, we’ll see that HTML documents contain metadata.
If an HTML document is empty, we only see its skeleton. That looks like this:
<!doctype html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>My web page</title>
</head>
<body>
</body>
</html>
Let’s explore the elements of this HTML document:
<!doctype html>
is a standard marker for modern HTML documents.<html>
is the top-level element that contains all of the document.
<head>
contains metadata for the document.
<meta charset="UTF-8">
specifies how the text is encoded. That helps with characters beyond simple A-Z such as characters with accents etc.<title>
specifies the title of the document.<body>
contains the actual data of the document. This is where we put headings, paragraphs, bullet lists, etc.Quoting the WHATWG HTML standard on the attribute lang
: “Authors are encouraged to specify a lang
attribute on the root html
element, giving the document’s language. This aids speech synthesis tools to determine what pronunciations to use, translation tools to determine what rules to use, and so forth.”
In this section, we explore HTML elements for content. To see them in use, you can open the file html/demo-content.html
in a web browser. Note that the URL of that page has the protocol file:
(because it’s stored in a local file).
Comments let us add notes for ourselves to HTML code that are not displayed on screen:
<!--This is a comment-->
Emphasizing text (which affects how it is read out loud):
Now <em>that</em> is a good idea.
Defining new terms:
A <dfn>bicycle</dfn> is a vehicle with two wheels.
Citing names, book titles, etc.:
I enjoyed <cite>His dark materials</cite>.
Mentioning foreign-language text (the idiomatic text element):
She said <i lang="fr">au revoir</i>.
Indicating strong importance:
<strong>Important:</strong> Close the door!
Highlighting text that does not otherwise have special meaning (the bring attention to element):
I like <b>apples</b> and <b>bananas</b>.
If we click on a hyperlink with an href
attribute, the browser jumps to the location specified by that attribute:
<a href="https://example.com">Link to another website</a>
<a href="demo-controls.html">Link to page of current website</a>
<a href="#tables">Link to ID of current page</a>
In principle, each HTML element can have an ID. If it does, we can link to it via a URL fragment. Each ID must be globally unique – we can’t use the same ID more than once.
The most common HTML elements with IDs are (we linked to #tables
in the previous subsection):
<h2 id="tables">Tables</h2>
<p id="html-def">
<span id="html">HTML</span> stands for “Hypertext Markup Language”.
</p>
Explanations:
<h1>
, <h2>
, etc.: Headings having IDs is very useful because then we can link to them.<p>
: Sometimes paragraphs are worth linking to.<span>
: For words we want to link to. Spans are explained later.The image element looks like this:
<img
src="demo-content/html5-logo.svg"
width="128" height="128"
alt="HTML5 logo: “HTML” above a shield with a 5"
>
What do these attributes do?
src
points to the file with the image to be displayed.width
and height
specify the dimensions of the image. Their values are integers without units. Note that that is rare in HTML and CSS: In general, lengths have units and can have decimal fractions (after decimal points).alt
contains a textual replacement that can be displayed when the image file isn’t found. That also helps screen readers (programs that read HTML to people with vision impairments).Images are inline elements. For images that are more like block elements, we can use <figure>
:
<figure>
<img
src="demo-content/html5-logo.svg"
width="128" height="128"
alt="HTML5 logo: “HTML” above a shield with a 5"
>
<figcaption>
Source: <a href="https://www.w3.org/html/logo/">W3C</a>
</figcaption>
</figure>
The caption is optional.
<code>
and <pre>
The code of computer languages such as HTML, CSS and JavaScript is usually displayed in monospaced fonts where most characters have the same width. That has the benefit of making it easier to distinguish between English (metalanguage) and code (language).
<code>
HTML:
The <code><img></code> tag.
Displayed as:
The
<img>
tag.
<pre>
<pre>
is the first block element in this list. A block element always introduces a horizontal break: The previous block element (e.g. a paragraph) ends and the new content starts: in a new line, at the left.
HTML:
<pre>
<strong>important<strong/>
</pre>
Displayed as:
<strong>important<strong/>
Rules for writing the content:
<
.Paragraphs are block elements that are separated from other block elements (including other paragraphs) by small vertical gaps.
HTML:
<p>
This is a line break<br>
inside the first paragraph.
</p>
<p>
Second paragraph
with two lines.
</p>
Displayed as:
This is a line break
inside the first paragraph.Second paragraph with two lines.
<span>
and <div>
<span>
Whenever we need to group inline content, we use the generic inline container <span>
:
<span>Inline content goes here.</span>
We have already seen one use case for <span>
: Linking to inline content. It’s also useful for styling such content via CSS as we’ll see later in this chapter.
<div>
Whenever we need to group block content, we use the generic block container <div>
:
<div>
Block content goes here.
</div>
<div>
being a block, it introduces a horizontal break. However, in contrast to <p>
there is no vertical break before or after it. We can use it whenever we want to display one or more single lines in an HTML document.
<div>
is also useful for styling and laying out block content via CSS. More on that later in this chapter.
A block quotation is for quoting block content: Sometimes there is something someone has said or content from a source that we would like to mention. Then <blockquote>
lets us do that while making clear that the quoted content is different from normal content – e.g. by indenting the former or visually highlighting it in some other way.
HTML:
<blockquote>
<p>
For myself, I am an optimist — it does not seem to be much use
being anything else.
</p>
</blockquote>
<div>
— Winston Churchill
</div>
That is rendered roughly like this:
For myself, I am an optimist — it does not seem to be much use being anything else.
— Winston Churchill
Alternatively, we can wrap the block quotation in a <figure>
and put the author in the <figcaption>
(without a <div>
).
Headings structure content. The HTML for headings go from <h1>
to <h6>
, with the numbers in their names indicating their levels. One way (among several) of using the levels is:
Example:
<h1>Title of document</h1>
<h2>1. Section</h2>
<h3>1.1. Subsection</h3>
HTML supports two kinds of lists:
<ul>
<ol>
The items of lists are always <li>
elements (list items). Lists can be nested, as the following HTML code demonstrates:
<ol>
<li>First take these steps:
<ul>
<li>Step 1a</li>
<li>Step 1b</li>
</ul>
</li>
<li>Then take these steps:
<ul>
<li>Step 2a</li>
<li>Step 2b</li>
</ul>
</li>
</ol>
Rendered as:
First take these steps:
Then take these steps:
In HTML, a table has the following structure:
<table>
element contains all of the tabular data.<thead>
(“table headers”) is optional and contains the headers of the table.<tbody>
(“table body”) contains the body of the table.<tr>
(“table row”) contains a single row with cells (inside <thead>
or <tbody>
).
<th>
(“table header element”) is a header cell that is contained within a <tr>
.<td>
(“table data cell”) is a normal cell that is contained within a <tr>
.HTML:
<table>
<thead>
<tr>
<th>Acronym</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>WWW</td>
<td>World Wide Web</td>
</tr>
<tr>
<td>HTML</td>
<td>Hypertext Markup Language</td>
</tr>
<tr>
<td>HTTP</td>
<td>Hypertext Transfer Protocol</td>
</tr>
</tbody>
</table>
Rendered as:
Acronym | Meaning |
---|---|
WWW | World Wide Web |
HTML | Hypertext Markup Language |
HTTP | Hypertext Transfer Protocol |
There is also a simpler way to write tables: We can omit the tags <thead>
and <tbody>
so that the <tr>
elements are located directly inside <table>
. <th>
vs <td>
is often enough to distinguish between header rows and body rows.
First, play with html/demo-content.html
: Edit it on disk, reload the browser, observe what changes.
Next, write your own HTML:
html/demo-content.html
..jpg
, .png
, .svg
, etc.) to the folder and display them in your web page via <img>
.<figure>
(as described previously).In this section, we explore HTML elements that are displayed as user interface elements (controls). Controls are mostly used to create user interfaces for apps whose interactive behavior is implemented via JavaScript.
To see controls in action, open html/demo-controls.html
in a web browser.
<label>
The HTML element <label>
attaches a descriptive text to a user interface element. For example, the user interface element checkbox is just a small box that can be checked. It doesn’t make much sense on its own. That’s why we give checkboxes labels:
<label>
<input type="checkbox" checked>
I agree to the terms and conditions
</label>
That is rendered as:
☑️ I agree to the terms and conditions
<input>
This HTML element lets users input a variety of data – e.g., the following HTML shows a text field for a single line of text:
<input type="text">
Among others, we can use the following values for the attribute type
:
text
: a single line of texttel
: a telephone numberurl
: a URLemail
: an emailpassword
: a password (a single line of text that is hidden)date
: a datenumber
: a numberThe WHATWG HTML standard has a complete list of input types.
The following types of inputs usually appear in groups:
checkbox
: In a group of checkboxes, zero or more can be checked.radio
: In a group of radio buttons, at most one can be checked.This is an example of a group of radio buttons:
<div>
<label>
<input type="radio" name="flavor" value="chocolate" checked>
Chocolate
</label>
</div>
<div>
<label>
<input type="radio" name="flavor" value="vanilla">
Vanilla
</label>
</div>
<div>
<label>
<input type="radio" name="flavor" value="strawberry">
Strawberry
</label>
</div>
What makes them a group is that all of them have the same name
. Both name
and value
are used internally and determine the data that is generated if these input elements are used in a form (more on that later).
If the optional attribute checked
is present, the checkbox or radio button is checked (“selected”).
The previous HTML is also an example of using <div>
elements to display lines.
<button>
The <button>
element displays a push button:
<button>Push me!</button>
<textarea>
Where <input type="text">
shows a single line of text, <textarea>
shows multiple lines.
<label>
<div>
Any comments?
</div>
<div>
<textarea rows="5" cols="60">
I love
your website!
</textarea>
</div>
</label>
Note that a textarea is not a block element – which is why we have to wrap it in a <div>
here: We want it to appear below its label, not next to it.
The attributes rows
and cols
specify the length and width of the textarea, in characters. For the content, the same rules apply as for <pre>
: We must escape special characters, whitespace is not collapsed and line endings break up lines.
disabled
Sometimes we want to switch off a control (usually temporarily). Attribute disabled
lets us do that:
<div>
<label>
Disabled text field:
<input type="text" disabled>
</label>
</div>
<div>
Disabled button: <button disabled>Push me!</button>
</div>
Both the text field and the button are grayed out and can’t be used.
There are two ways in which we can access the data stored in controls:
<form>
element and send the data in the form to a server or use it via JavaScript. Such data is a sequence of key-value pairs: Each key is the name
of a given control, each value reflects its current state.So far, we have only used the attribute name
to group radio buttons. If we work with forms, all controls must have names.
For more information on forms, see these MDN pages:
<form>
: The Form element”FormData
”Where HTML defines the fixed structure of content, it is complemented by two technologies that build on it as a foundation:
This is when the three web development technologies HTML, CSS and JavaScript were created:
HTML is said to be for content, CSS for presentation. What are the benefits of this kind of separation of roles?
HTML initially did not separate content and presentation (see HTML 2). We can still see that when we look at the names of some HTML elements:
HTML element | Old name | Current name |
---|---|---|
<b> |
bold | bring attention to |
<i> |
italic | idiomatic text |
<hr> |
horizontal rule | thematic break |
The following HTML elements contain code from other computer languages:
<style>
is for CSS code<script>
is for JavaScript codeThis kind of integration helps with web development, because we are not forced to use multiple files if we want to create a web page with HTML, CSS and JavaScript.
<script>
In JavaScript, we should avoid the following sequences of characters and use their escaped versions instead (which work inside string literals and regular expression literals):
Avoid | Escaped |
---|---|
<!-- |
\x3C!-- |
<script |
\x3Cscript |
</script |
\x3C/script |
However: That’s rarely, if ever, an issue in practice!
id
and class
Two HTML attributes play an important role in connecting the world of HTML with the worlds of CSS and JavaScript:
id
is for selecting a single HTML element. We can’t use the same ID more than once.class
is for selecting one or more HTML elements. Usually more than one element in an HTML document have the same class.We can use IDs and classes to tell CSS where to apply a given style. We refer to them via the following syntax:
Attribute | CSS syntax |
---|---|
id="my-id" |
#my-id |
class="my-class" |
.my-class |
In JavaScript, we can use CSS syntax to retrieve HTML elements:
const singleElement = document.querySelector('#my-id');
const multipleElements = document.querySelectorAll('.my-class');
We’ll learn more about that when we explore CSS and JavaScript.
<span>
and <div>
In CSS, the generic containers <span>
and <div>
help with laying out content – often combined with the attribute id
or class
:
<div id="sidebar">
...
</div>
<div id="content">
<div class="tip-box">
Useful tip: ...
</div>
</div>
Note that HTML has many non-generic containers. You should use those instead of <span>
and <div>
whenever you can. These are a few examples:
<main>
<header>
<footer>
<figure>
<article>
<section>
In browsers, we can inspect HTML elements. The following pages explain how to do that for:
You should also find material if you do a web search for “inspect element «name of browser»”.
What does the inspector give us?
HTML is very forgiving when it comes to displaying HTML files: It never complains about incorrect syntax. For example, a browser will still display a web page if we omit most of the tags of the HTML skeleton:
<!doctype html>
<meta charset="UTF-8">
<title>My web page</title>
However, that is not recommended!
The WHATWG HTML standard is very well written and should even be interesting for beginners: Simply skip the parts with obscure notations and lists and only read the English prose and the HTML examples.
MDN has “Structuring content with HTML”
Google’s web.dev
site has an HTML course.