Optimizing Web Page Documents
Web pages, also known as HTML documents or files, are the individual files stored on the server that are viewed by the site visitors. HTML stands for Hypertext Markup Language. To put it simply it is a computer language (although most programmers do not consider it to be a proper computer language) that is then read by the web browser (a computer program), like Microsoft’s Internet Explorer or Netscape’s Navigator, which then performs the different commands in the code. Most of the commands just influence how and what you see on the page, just keep in mind that there are also commands that you do not see in the output.
Unlike most computer languages, which are first written and then compiled into a binary form, HTML is ASCII or Text based. What this means is that you can create and edit them with just about any word processor, like Notepad. Most of the time you can see the HTML in it’s raw format for web pages by simply clicking "View Source" on the web page you are currently viewing in your browser.
The different Parts of an HTML document:
HTML documents have two main parts, the first is the HEAD and the second is the BODY. Each of these in turn has various elements in them.
The HEAD tag:
The web site visitor does not see most of the HTML tags inside the head section of the document. Instead, these are mainly different commands that the author of the web page inserts to set the properties of the page. Among the different properties are:
- The Character Set for various languages
- Base Target and HREF
- Refresh or Redirect
- And much more…
The most common tag that is in the HEAD and that is, at least partially, visible to the visitor is the TITLE tag. The text inside of the TITLE is what you see at the very top of your browser. The TITLE is also very important to the search engines, and hence Search Engine Optimization. We will go into this later on.
Other "invisible" tags inside of the HEAD which are important to S.E.O. are Meta Tags which include:
- Description
- Keywords
- Robots
Examples:
<meta name="description" content="A short description here">
This lets the spiders know what you think should be the description of your page. Some spiders use it as is, some ignore it entirely and some use parts of it along with the content of your page. In general you can put up to 250 characters in the description, but is suggested to keep is at 150 or less. This helps avoid having your description cut off by the search engines.
<meta name="keywords" content="your keywords,here">
Your Keywords and phrases here, comma separated. This lets the spiders know what you think should be the description of your page. Some spiders use it as is, some ignore it entirely and some use parts of it along with the content of your page This can be up to 1024 characters, but it is suggested to keep under 800.
Both of these MUST reflect the content of your page, other wise you are taking a chance of being either penalized or banned from the search engines
Robot Meta Tag Options
The meta tag robots lets spiders know if to crawl your page and if to follow the links in it. If you have a page that you DO NOT want indexed by a search engine you can add the following tag.
<META NAME="ROBOTS" CONTENT="NOINDEX">
INDEX – tells the robot it is ok to index the page.
FOLLOW – tells the robot it is ok to follow the links found on this page
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
The BODY tag:
The BODY tag is where all the visible elements of your web page are located. This is what the visitor actually sees.
Headings (H1-H6) tags should be as close to the top of your visible page as possible. Search engines look at the contents of this tag for relevant Keywords, and they are usually given more weight in the algorithms than normal text in your page.
The COMMENTS tag is used to place invisible comments inside of your web page. A simple example would be
<!- Start Left Navigation Bar here –>
It is used for internal comments, your should never use this tag to help you position in the search engines as you risk getting penalized by them.
The Images (IMG tag) is often forgotten when optimizing your site. The first thing you need to do is make sure to include the height and width of the image. This has to do with the validation of the HTML document, some search engines will not be able to process your page unless its syntax is correct. The second this is the ALT attribute of the IMG tag, This is the alternative text that is shown until the image is loaded and when the mouse is over the image. As in the COMMENTS tag it is a great place to increase the Keyword density in your document as well as to place the common misspellings.
<img src="your_image.gif" height="10px" width="50px" alt="more Keywords and misspellings" >
Links (A HREF) tag – as in the IMG tag the alternative text (shown when the mouse is over the link) is often forgotten when optimizing your site. Instead of using the ALT links use the TITLE attribute.
<a href="your_link.html" title=" more Keywords and misspellings"> more Keywords </a>
Frames, quite simply avoid them when ever possible. While search engines can follow the links to the various pages they do not give the content of any of these pages to the main page. The only thing that they see is the HEAD and the NOFRAMES tag. You have to treat the search engines like an old browser that is not capable of displaying frames.
The HTML code for a simple, non-frames, HTML document will look something like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>This is my Title</title>
<meta name="description" content="A short description here">
<meta name="keywords" content="your keywords,here">
<META NAME="ROBOTS" CONTENT="NOINDEX">
</head>
<body>
<H1>HEADING TAG</H1>
<!- COMMENTS TAG –>
<img src="http://www.alon-cohen.com/ac-logo.gif" width="457" height="75" alt="more Keywords and misspellings">
Regular text
<a href="your_link.html" title="more Keywords and misspellings">MY Keywords</a>
Some more regular text
</body>
</html>
|
In the browser, the preceding document will look like:

The HTML code for a simple, non-frames, HTML document will look something like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>This is my Title</title>
<meta name="description" content="A short description here">
<meta name="keywords" content="your keywords,here">
<META NAME="ROBOTS" CONTENT="NOINDEX">
</head>
<frameset rows="40%,*">
<frame src="G:\Documents and Settings\alon\Desktop\temp\index.html">
<frame src="http://www.alon-cohen.com">
</frameset>
<noframes>
Any one, person or spider, who does not have frames support will see the contents of this sentance instead of the frames you intended.
</noframes>
</html>
|
In the browser, the preceding document will look like:

Note: Any browser or spider that does not support frames will only see the following (only the HEAD and NOFRAMES tag are visible):

Validation:
As we have stated earlier correct HTML syntax can be very important to the optimization of your page. To check that your page’s HTML syntax is correct you can use the HTML validator located at: http://validator.w3.org
|