Sitemap:
- Sitemap is a collection of web pages of website that are placed in single page that is available for both search engines as well as users.
- Xml is used to generate sitemap for a website.
- Sitemap tells a search engine that how frequently your contents in a page changes and follow the priority while indexing the web pages.
Rules of a sitemap:
- A sitemap can have at most 50,000 URL's.
- A sitemap file size should not greater than 10 MB.
- If you have more than 50,000 URL's then you can categorize the URLS and and create sub sitemaps for it.
- Use sitemap index to hold all the sub sitemaps of a website.
- Submit the sitemap index to the search engine bots. It will automatically crawls the all web pages from a given website.
Sitemaps framework in django
- It automates the creation of sitemap xml
- We can classify the urls of a website as static and dynamic
Configurations of sitemaps framework
- We have to add 'django.contrib.sitemaps' in INSTALLED_APPS setting.
- You must add 'APP_DIRS': True in TEMPLATES setting.
Sitemap Class:
- It has the following attributes or methods that will help in creating the sitemap.
items:
- It is a method and It returns list of objects.
- Every object that is returned from this method will be passed to location(), lastmod(), changefreq() and priority() methods.
location:
- It is either an attribute or a method
- If it is a method, It takes an item/object as a argument.
- It's responsible for returning the absolute path.
- Absolute path should not contains protocol(http, https), domain name.
- Example: "/contact-us/", "/country/india/"
- If you do not override it will call "obj.get_absolute_url()" by default.
- If the object doesn't have get_absolute_url it will raises an error.
lastmod:
- It is either an attribute or a method.
- It is responsible for returning a date or datetime object.
- It returns none by default.
- It is used to tell a search engine that when the page is last modified.
changefreq:
- It is either an attribute or a method.
- If it is a method, It takes an item/object as a argument.
- It tells search engines that how frequently a web page changes.
- Possible values are "always", "hourly", "daily", "weekly", "monthly", "yearly", "never"
priority:
- The priority of this URL relative to other URLs on your site.
- It's valid values range from 0.0 to 1.0.
- It only lets the search engines know which pages you deem most important for the crawlers.
-
The default priority of a page is 0.5.
protocol:
- It is an attribute. It specifies which version of protocol to use while generating full url.
- Possible values are "http", "https"
- Default values is "http"
i18n:
- It is a boolean attribute.
- If it is true then it will add a language prefix for all urls.
- By default it is set to "False"
GenericSitemap:
- It simply inherits the Sitemap class. So, It gets all the above mentioned properties.
- It takes arguments "info_dict", "priority", "changefreq", "protocol".
- info_dict contains two keys 1. "queryset", 2. "date_field"
- We don't need to write extra classes for Sitemaps instead we can use it
If we use sitemap view for urls below 50,000 and sitemap index for above 5,000.