¤ Home » Internet » SEO » How to build a search engine friendly sitemap?

How to build a search engine friendly sitemap?

One of the concerns facing most website owners today is how to make search engines find all the pages of their website. Search engines do have a fairly smart spider which is capable of crawling through your entire website and extracting all links. However, for large websites consisting of several hundred or perhaps a few thousand web pages, search engines might miss some deeper level pages, especially if they are linked from within inner pages and do not find a place in your main navigation menu tree.

Hence, it is always a good idea to present your entire list of links to search engines in an easy way so that the search engine can find all your links in one page. Off-course there is more to a sitemap than merely presenting a list of links, which you will learn as you read on.


What is a sitemap?

A sitemap for a website is analogous to the index page of a book. Normally, when you build your website you would provide a nice easy to navigate multi-level menu bar at the top, so that visitors can quickly find out what they are looking for and jump to that page by clicking at the appropriate link in your menu tree.

So you may ask – If I have created a nice multi-level menu tree for ease of navigation, why do I then need a second index in the form of a sitemap. Well, to answer this - while your menu tree is useful for your human site visitors, a sitemap file is more meaningful for search engine crawlers.

Normally, a sitemap would just be a single file containing your entire list of links along with other meaningful information for the crawler. Naturally, this file must be written in a program friendly format, and that format is XML. This file is always named sitemap.xml (all lower case). Nearly all search engine crawlers today support the xml format sitemap. So one file does it all for all search engines.

Note that providing a sitemap xml file does not necessarily guarantee that search engines will index all your pages. Finally, it is the prerogative of the crawler to decide which pages to ignore, based upon several other factors which is a subject matter of SEO.


Purposes that a sitemap file serves:

  1. Lists all links of your website providing the crawler with absolute urls of all your pages.
  2. Tells the crawler when the page was last updated.
  3. Tells the crawler how frequently the page content is likely to change.
  4. Tells the crawler how important or relevant is the link with respect to other listed links.

Sitemap Format

Before we delve on this further, let us first take a look at a typical sitemap file. Check the sitemap.xml file for this website to get an idea of a real sitemap file.

Below is an example of a very basic sitemap file with just 3 links. Note that the file should contain characters in UTF-8 encoding.

typical_sitemap.jpg


Now let us explain the tags


Where should sitemap.xml reside and how to tell the search engine where it is?

The sitemap.xml file must always reside in the home directory of your hosting account which is usually the public_html directory (in case of a linux system) and the httpdocs directory in case of a windows system.

Tell all search engines the location of your xml sitemap by placing an entry into your robots.txt file as below:

Sitemap: http://www.yoursite.tld/sitemap.xml

Here is a typical example of a robots.txt with the sitemap entry. The robots.txt file must also reside in the home directory.



Points to Note

  1. You must have noticed that for the <loc> tag we have enclosed the url string within a CDATA section. A CDATA section starts with <![CDATA[ and ends with ]]>. This is done to escape certain special characters that may be contained in your link url – such as & (ampersand), >, <, ' (single quote), " (double quote), etc. Hence, it would be safer to enclose all url strings within the CDATA section, so you won’t have to worry about the special characters included in your url strings.
  2. Make sure that your sitemap.xml file size does not exceed 10 Mb. For very very large websites, where this may be unavoidable, there is provision to create multiple sitemap files.
  3. If you have SSL implemented in your website and you have a situation where some urls begin with http://, while others begin with https://, you should not include both url versions in the same sitmap file. Make sure to use any one of the two versions, whichever is suitable for all your website pages.
  4. The relative order of urls in your sitemap.xml file is immaterial. You can place them in any order.


Share:




comments powered by Disqus


Buy Domain & Hosting from the most reliable and trusted company - WebServicesWorldWide.com.

Looking to build a website?
Launch a 20 page website in 1 day at only Rs.200/month or US$ 3.19/month. Hosting & Email included.





About the Author

Rajeev Kumar
CEO, Computer Solutions
Jamshedpur, India

Rajeev Kumar is the primary author of How2Lab. He is a B.Tech. from IIT Kanpur with several years of experience in IT education and Software development. He has taught a wide spectrum of people including fresh young talents, students of XLRI, industry professionals, and govt. officials.

Rajeev has founded Computer Solutions & WebServicesWorldwide.com, and has hands-on experience of building variety of web applications and portals, that include - SAAS based ERP & e-commerce systems, independent B2B, B2C, Matrimonial & Job portals, and many more.



Copyright © How2Lab.com. All rights reserved.

Refer a friend | Sitemap | Disclaimer | Privacy Policy