HTML and Web Technology Notes

Lesson1.pdf Lesson 1 Slides
Lesson1all.pdf Lesson 1 Slides (with builds)

Becoming fluent in HTML

There are various tools that you can use to create HTML files. Some people like to use WYSIWYG ("What You See Is What You Get") tools, like word processors or editors made specifically for use with web pages. One of the big disadvantages of using a word processor for HTML is that some word processors produce HTML that is very long and complicated. You can see that for yourself if you save a word processor document as HTML and then look at the file in a plain-text editor.

For this class it's very important reason to be fluent in HTML and not rely on WYSIWIG tools. We will be writing programs that analyze and generate HTML, so if you don't understand HTML you will have a difficult time writing those programs. Make sure that you are familiar with the basic structure of an HTML document and the most commonly used tags.

Here's a list of free editors that might be useful:
Atom Cross platform
Notepad++ Windows
Text Wrangler Mac
Brackets Cross platform
Komodo Edit Windows, Mac, Linux
Bluefish Primarily for Linux, but also available for Mac and Windows

It's okay to use an IDE. Just make sure that you type in HTML tags directly.

HTML tutorials

Our textbook, Fundamentals of Web Development, has several chapters about HTML and you should read those chapters to help you learn HTML. In addition, many web sites have good tutorials and reference materials about HTML. Here are some sites that you might find helpful:

Important HTML tags and concepts

There's a lot to learn about HTML, and we have relatively little time before we have to move on to CSS and then JavaScript. Make sure you understand all of the topics and tags listed below. If you come across something you don't understand, you can look it up on the Web or ask about it in the discussions on Canvas. You can also ask the teacher by email or in office hours.

Note that only opening tags are listed here even though most of these tags are used in opening-closing pairs.

Browsers

Unfortunately, testing web applications in several different browsers is a necessity because of the many browser dependencies you will encounter: Internet Explorer vs. Firefox, old versions vs. new versions, and so on.

Some Web developers simply aim for the biggest market share and ignore other browsers. In this class I discourage that approach and encourage you to make your Web applications standards-compliant and as browser-indpendent and system-independent as possible.

Browser dependencies are even more prevalent in JavaScript than they are in HTML and CSS. Fortunately there are a number of libraries and frameworks that can hide browser differences. (You should not use frameworks like jQuery for assignments in this class, but in general they are very important in web programming.)

Web technology

In this class we won't study the network technologies that make the Web work, but it's a good idea to know the basic concepts and terminology.

There are two international organizations that are important for Web technology, the IETF and the W3C:
• The IETF (Internet Engineering Task Force) defines standards for various aspects of Internet technology. Their mission, as stated on their web site at http://www.ietf.org, is "to make the Internet work better by producing high quality, relevant technical documents that influence the way people design, use, and manage the Internet"
• The W3C (World-Wide Web Consortium) defines standards for Web technology. As stated on their web site (http://www.w3.org/Consortium/) "The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards. Led by Web inventor Tim Berners-Lee and CEO Jeffrey Jaffe, W3C's mission is to lead the Web to its full potential." Two of the main technologies that we use in this class, HTML and CSS (Cascading Style Sheets), are defined by W3C standards.

The IETF and the W3C together defined HTTP (Hypertext Transfer Protocol) which specifies how web browsers and web servers communicate.

When you surf the Web, you use a web browser like Chrome, Firefox, Internet Explorer (to name a few) to download and view web pages from all over the world. Let's look at what happens when you look at a web site in your browser.

The computer that the browser is running on is called the local computer or the client computer. The computer that the web server software is running on is called the remote computer or the server computer. In some cases, the browser and the server program are on the same computer, but usually they are running on two different computers.

  1. The browser gets a URL (Uniform Resource Locater) for a web site. You might type in the URL directly, or the browser might get the URL from a bookmark or a link that you click. Here's an example of a URL:
          http://universe.tc.uvu.edu/cs2550/notes/index.html
  2. The browser gets the domain name of the server from the URL. Examples of domain names are www.uvu.edu, google.com, and www.w3.org
    In the sample URL above, the domain name of the server is universe.tc.uvu.edu
  3. Next, the browser sends an HTTP request message for the server. HTTP stands for HyperText Transfer Protocol. The request message specifies the resource (often a file containing an HTML document or an image) from the URL.
    In this example URL:
          http://universe.tc.uvu.edu/cs2550/notes/index.html
    the requested resource is cs2550/notes/index.html In this case, the resource is a file named index.html which can be found in the directory cs2550/notes
    The request message also includes headers which give additional information about the browser and the request.
  4. When the server receives the request message, it finds the requested resource and sends it back to the browser, along with headers that describe the resource. For example, the content-length header tells how many bytes of data are in the resource.
    Requested resources are often files stored on a disk on the server computer, but sometimes resources are created by running a program. PHP is one programming language that is commonly used with web servers, but there are many otheres.
    In some cases, the requested resource doesn't exist on the server or there are other problems. The server sends an HTTP status code back to the browser to indicate whether the request was successully handled. Examples of status codes are 200 when the request was successfully processed, and 404 when the resource was not found.
  5. When the browser receives the server's response, it formats the data for you to view on the screen. In some cases, the response might be just a few characters, and formatting and displaying the response are trivial. In other cases, the response can be many megabytes long, or even gigabytes, and formatting and displaying the response data can be very complicated and take a long time.

A browser's request, together with the server's response, make up an HTTP transaction. HTTP transactions are always initated by the client, which makes it difficult to write some kinds of applications, like text chat, using only HTTP.

URLs

A URL (Uniform Resource Locator) specifies how to access a file or other resource. According to the Wikipedia article for URL, the syntax for a URL is:
scheme://domain:port/path?query_string#fragment_id

http://universe.tc.uvu.edu/cs2550/notes/index.html

The scheme is also called protocol and determines what the format of the remainder of the URL will be. The two protocols we will use most in this class are http, for web addresses, and file for local files.

The domain is the domain name of a web server. An IP address can be used instead of a domain name.

The port is useful when there is more than one web server program running on the same computer, or when a server is not using the default port (which is port 80).

The path tells where the resource is located on the server, if the resource is a file. For local files, with a file: scheme (or protocol), there is no domain or port, so the path is the most important part of the URL.

The query string is mostly used by server-side programs, but JavaScript programs can also make use of it. The information from HTML forms is sometimes sent to the server in a query string.

The fragment_id tells the browser what part of the document should be displayed.

IP addresses

In order to communicate with the server, the browser (or network software on the client computer, needs the IP address associated with the domain name. IP stands for Internet Protocol. Network protocols, like IP are formal specifications of the messages that are sent from one computer to another across a network link.

There are two kinds of IP addresses in use: IP4 (version 4 of the Internet Protocol) and IP6 (version 6 of the Internet Protocol). This is an example of an IP4 address: 161.28.117.185
This is an example of an IP6 address (in hexadecimal, or base 16 notation): fe80::250:56ff:fe87:3b1d

Name servers are network programs that translate domain names to IP addresses.

Working with local files

In this class, we will be working with the client side of web programming. Rather than putting our HTML, CSS, and JavaScript files on a web server, and accessing them over the Internet, we will open them as local files. Web browsers are made to work with local files as well as files loaded from a server. Some browsers have two menu items for opening web pages, Open File..., which will load a local file, and Open Location..., which will load a web page from another network location.

Because we are working with the client side of web programming, we will work with local files. For your assignments you will use a text editor, like Notepad++, to write HTML files and save them on your computer. Then you will open them in your browser using Open File... and browsing your local file system to find the HTML document.

If you use Linux or Mac OS X, you can (relatively) easily start a web server running on your computer. However, that is not necessary for doing your assignments and project for this class. You can also run web servers on Windows computers, but as far as I know, a web server isn't installed by default on Windows.

Web 2.0 and other buzzwords

A few years ago someone came up with the term "Web 2.0". According to Wikipedia the term "became notable after the first O'Reilly Media Web 2.0 conference in 2004." Although there are various definitions of Web 2.0, the definition that I will use here is the one of "using the Web as a platform".

One example of using the Web as a platform is Google Documents. To use Google Documents, you don't install anything on your computer (other than a web browser, in the very unlikely event that there wasn't one installed already). You go to a Web site, and then you can create word processor documents, spreadsheets, and presentations. From my experience with Google Documents it seems pretty clear that there aren't as many features as you would find in the Microsoft Office or Open Office equivalents, but there's no question that you can do a lot, and everything you do happens inside a web browser.

Another buzzword that has appeared in recent years is "Rich Internet Application" (RIA). The RIA concept is similar to the "Web as a platform", except that it often refers more to the user interface (UI) than to the underlying implementation. Silverlight and Flash are two products often associated with making RIAs.

As we look at various aspects of Internet software development in this class, I encourage you to think in terms of Web 2.0, using the Web as a platform and RIAs. Instead of thinking only of writing HTML and JavaScript, think in terms of applications and what you need to know to create Web applications that can do things that desktop applications have done, but have the advantages of being Web applications.

This is not a class in graphic design...

...but it is important for you to be aware of some design issues so that you can work well with the people who have taken classes in graphic design. It's also possible that you will be in a situation where you have to do everything for a company or group's Web site: graphic design, writing, software design and implementation, maintenance, etc.

As you use various sites on the Web, think about their design: What works well? What doesn't work so well? How would you change the site?

Here's a Web design site that you might find interesting and helpful: http://webdesignfromscratch.com

The <div> tag

The div tag (division) wasn't used much in the early days of HTML, but has become much more important than it once was. The class attribute is often used with a div element so that a style can be assigned to the division. We'll learn more about that when we get to Cascading Style Sheets (CSS). In addition to its use with styles, the div tag is important when using JavaScript. Later in this course we will see how JavaScript code can change parts of a Web page. Virtually any element of a page can be modified by JavaScript code, but div elements are used in this way more often than most other HTML elements. The id attribute is important for using JavaScript to modify a div element.

HTML5

We will be learning and using HTML5, the most recent version of HTML. HTML5 includes some major changes from earlier versions, including audio and video tags, offline applications, and a graphics library.

A good resource for learning HTML5 is the online book Dive Into HTML5 (diveintohtml5.info). I don't expect you to read the online book for this lesson, but I do recommend that you look through the book and pay particular attention to the history chapter at the beginning.

One of the requirements for the project is to incorporate one feature of HTML5 into your web application, so start thinking about what you can use from HTML5.

HTML for JavaScript Applications

Here is a link to a video about several aspects of HTML that are particularly important for JavaScript web applications: