Rah sitemap - XML sitemap generator for Textpattern CMS

Overview

rah_sitemap

Packagist | Issues | Donate

Sitemap plugin for Textpattern CMS. Generates Sitemaps.org XML sitemaps for your site, which help Google and other search engines to index your valuable content. Rah_sitemap maps your categories, sections, articles and even custom URLs of your choosing, and what is best, none of it requires diving into code. All configuration is done from a clean graphical user-interface.

Install

Using Composer:

$ composer require rah/rah_sitemap

Or download an installer package.

Basics

Rah_sitemap generates a sitemap for your Textpattern website, listing all of its section-, article category- and article-pages. The generated sitemap follows the XML based Sitemap protocol format and is targeted to search engines, opposed to your visitors. The sitemap is meant to help search engines to index your site as it grows and gets more and more various nested pages.

The Sitemap can be configured directly from Textpattern’s Preferences panel, making rah_sitemap very easy to setup and use. The plugin itself takes care of the rest.

Accessing the sitemap

The generated sitemap becomes publicly accessible from the site’s root. The sitemap can be accessed from two URLs, depending on the site’s permanent link mode. If the site is configured to use clean URLs, the sitemap can be accessed using a clean path like http://example.com/sitemap.xml, where the example.com would be the site’s URL. Additionally a query version, http://example.com/?rah_sitemap=sitemap, is available in both the messy and the clean URL modes, and can be used if the other one isn’t available.

Immediate boost to search engine visiblity?

Not exactly. Providing Sitemap is to help crawling, finding your site’s pages that could otherwise be hard to discover. Sitemaps itself do not boost content’s visibility.

For a simple, almost a static website with just few easily discoverable pages, a sitemap may not be necessary at all. If search engines already can get to your pages, you do not really need rah_sitemap, or Sitemaps in general. Once a page is indexed, that’s where sitemap’s work ends.

Rah_sitemap is particularly useful when your site has pages that are hard to discover due to being loaded using Ajax, there is distinct content with little linking, or pages are nested deep in a complex page structures. Rah_sitemap can also help your site to start up when it has very few external links pointing to its pages.

It’s common misconception to think that Sitemaps guarantee that pages will be indexed. This is not the case. Sitemap is a map. Whether a place is marked on a map doesn’t mean someone will actually go there, or when. Like any map, the map is used to find and learn, to increase the future knowledge. Sitemap’s update interval also helps to estimate when your site is updated next and when it should be crawled again.

Normally, you will benefit from submitting a sitemap, but its just one piece in the puzzle. In no case will you get penalized from Sitemap or including wrong content in it.

Configuration

After rah_sitemap is installed, you may want to configure it to fit your site. For instance, you may want to exclude certain irrelevant articles or sections. The plugin’s settings can be configured from Textpattern’s Preferences panel, organized under its own Sitemap section. Sections and Categories can be excluded from the sitemap from their respective editors.

Sending the sitemap to search engines

Once you have a sitemap up and running, you may want to inform search engines about its existence. There are few ways you can do it: Webmaster Tools for Google, you could use a robots.txt directive or search vendor specific pinging.

The recommended way is by using a robots.txt file. To get robots.txt up and running, you will have to add robots.txt file at root of your site’s domain, so that its accessible from https://example.com/robots.txt. If your Textpattern site has fully functional clean URLs, is installed at the root and you already do not have robots.txt file, rah_sitemap will automatically create the file for you — or well, serve it dynamically.

If not, you will need to create or edit a file named robots.txt at the root of the domain. In that file you would add a Sitemap directive containing an absolute URL to your sitemap:

Sitemap: https://example.com/?rah_sitemap=sitemap

Where the https://example.com/ is your site’s location as defined in Textpattern’s Preferences panel. The directive should be placed on its own line.

Preferences

Rah_sitemap comes with number of preferences which all can be found from your Preferences panel, organized under a Sitemap section. Rah_sitemap allows excluding sections, categories and articles from the XML sitemap. Following options will be present.

Exclude articles based on fields

The field can be used to exclude articles from the sitemap based on any article field and its value. The option takes a comma-separated list of articlefield: value pairs, where the field is the database field and the value is the field’s value that will be excluded. Available fields include Title, AuthorID, Body, Excerpt, Category1, Category2, Section, Keywords, url_title, custom_1 to custom_10 and Image.

Values used in the option support two wildcard characters. An underscore (_) matches exactly one character, and a percent sign (%) matches zero or more characters.

If you wanted to exclude articles posted to sections named as notes and private or by a user mailer, you could use the following in the field:

Section: notes, Section: private, AuthorID: mailer

Additional URLs

Comma-separated list of additional local site URLs added to the sitemap. Note that a Sitemap only allows local URLs, meaning that any URL used, needs to link to the same domain as where the website itself is located. If a URL is relative and doesn’t start with a HTTP or HTTPS protocol, the site’s URL is prepended to the URL.

Include future articles?

If set to Yes, articles with future publishing date are visible in the sitemap. Please note that by default the article tag doesn’t display future articles, unless its time attribute is explicitly set to future or any.

Include published articles?

If set to Yes, published articles are visible in the sitemap. If both this option and Include future articles? are set to No, no articles will be visible in the sitemap.

Include expired articles?

If set to No, expired articles are not visible in the sitemap.

Exclude sticky articles?

If set to Yes, sticky articles are not visible in the sitemap.

For developers

Rah_sitemap offers small set of tools for developers. These tools allow other Textpattern plugins to extend rah_sitemap’s functionality by adding new URLs to the sitemap. The plugin is packaged in a class structure that can be extended if needed, and introduces new Textpattern callback events.

Callback

Rah_sitemap introduces a new public-facing callback event to the Textpattern’s event library named rah_sitemap.urlset. The event is fired before a sitemap is printed out. The callback event can be used with the API to add new URLs to the sitemap.

As with other callback events in Textpattern, hooking to rah_sitemap’s event happens using Textpattern’s callback handling functions, mainly register_callback.

register_callback('abc_function', 'rah_sitemap.urlset', 0, $urls);

fuction abc_function($event, $step, $void, $urls) {
    $urls['http://example.com/foo/bar'] = '2013-03-04 10:06:30';
}

Custom URL functions

If you are supplying a custom URL function for Textpattern, please note that the URLs the function generates need to meet RFC 3986 and RFC 3987. All URLs should also be entity escaped from special syntax characters using Textpattern’s txpspecialchars function. All URLs Textpattern itself generates follow these specifications, and so should your custom URL plugin.

As rah_sitemap integrates well with Textpattern’s core, it uses the same URL functions as Textpattern. If an URL given to the sitemap doesn’t meet those specification, the sitemap will become invalid.

Comments
  • Bugfixing: sitemap_include_in isn't shown correct on section and category page

    Bugfixing: sitemap_include_in isn't shown correct on section and category page

    In the Admin panels for section and category details the value of the property "sitemap_include_in" isn't shown correctly. This is because the used PHP-Function empty()' returns **false** when the property isn't empty and **false** is mapped to an empty string. The functionyesnoradio()` only supports zero or one as values no empty strings.

    Bug 
    opened by sebastiansIT 3
  • Problems with 4.7.3

    Problems with 4.7.3

    This plugin doesn't work at all in 4.7.3. It's unable to generate/output XML. It's quite an important plugin, so please gocom or someone else, consider updating the code.

    Needs clarification Invalid 
    opened by kuopassa 3
  • Sitemap <lastmod> value possibly incorrect

    Sitemap value possibly incorrect

    In version 3.0.0 the code safe_strftime('iso8601', $modifiedAt); creates a timestamp that ends with 4 digits, like 0000, but I think the correct format is to have a colon like this: 00:00.

    Bug Investigate 
    opened by kuopassa 2
  • Handle large sitemaps

    Handle large sitemaps

    This pull request implements support for large sitemaps. Instead of single huge sitemap being constructed dynamically, we generate an sitemap index file, which links to smaller paginated sitemaps. Each sitemap will only return and fetch 50,000 records each, lowering the maximum memory usage.

    Fixes #7

    opened by gocom 1
  • If Textpattern has many articles, sitemap not generated

    If Textpattern has many articles, sitemap not generated

    I have a Textpattern installation with more than 170k articles. I'm trying to include published articles to the sitemap output, but the output is this:

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 16384 bytes) in textpattern\lib\txplib_db.php on line 437

    opened by kuopassa 1
  • Wrong URL generated if permlinks include post date

    Wrong URL generated if permlinks include post date

    The URLs generated for permlink mode "/year/month/day/title" are wrong. E.g. for the article http://example.com/2014/01/12/foo you get the URL http://example.com/1970/01/01/foo. The reason is that Textpattern's permlinkurl() expects a unix timestamp in the posted field rather than an actual date. The following change fixes the issue:

             $rs = safe_rows_start(
    -            '*, unix_timestamp(Posted) as uPosted, unix_timestamp(LastMod) as uLastMod',
    +            '*, unix_timestamp(Posted) as uPosted, unix_timestamp(LastMod) as uLastMod, unix_timestamp(textpattern.Posted) as posted',
                 'textpattern',
    

    This query matches the one used in discuss_list() function in txp_discuss.php.

    opened by palant 1
  • Bugfix: Wrong operator for NULL-check

    Bugfix: Wrong operator for NULL-check

    Use Operator "IS" instead of "=" to check for NULL. The clause with "=" is never true. No article URLs are added to the sitemap if preference "Include expired articles?" is set to "no".

    Seen this Bug with:

    • Textpattern version: 4.7.3 (7c46d1f4c8ac79e62a7d5e54a9ddac53)
    • PHP version: 7.2.24-he.0
    • MySQL: 5.6.45-86.1-log (Percona Server (GPL), Release 86.1, Revision 5bc37b1)
    opened by sebastiansIT 0
Releases(4.0.0)
Owner
Jukka Svahn
Backend developer by trade, hobbyist designer and frontend fiddler, self-proclaimed home cook and baker
Jukka Svahn
Rah backup - Takes backups from Textpattern CMS installations

rah_backup Packagist | Twitter | Donate Rah_backup keeps your important site safe from disastrous events. Rah_backup is an admin-side backup utility p

Jukka Svahn 5 Apr 24, 2022
Rah memcached - Store parts of Textpattern CMS templates in Memcached

rah_memcached Packagist | Issues | Donate A plugin for Textpattern CMS that stores parts of your templates in Memcached, a distributed in-memory key-v

Jukka Svahn 2 Aug 12, 2022
Rah comment spam - Comment anti-spam plugin for Textpattern CMS

rah_comment_spam Packagist | Issues | Donate Rah_comment_spam provides customizable anti-spam tools for Textpattern CMS’ comment system. Set minimum a

Jukka Svahn 2 Apr 24, 2022
Rah comments - Paginated article comments list for Textpattern CMS

rah_comments Download | Packagist | Issues | Support forum | Donate Rah_comments lets you to paginate Textpattern CMS’ comment lists, splitting the lo

Jukka Svahn 1 Mar 23, 2015
Rah privileges - Configure Textpattern CMS' user-group privileges through Preferences panel

rah_privileges Packagist | Donate Configure admin-side user-group permissions from Textpattern CMS’ preferences panel. Install Using Composer: $ compo

Jukka Svahn 4 Apr 16, 2022
Rah cache - Cache Textpattern's dynamic pages as flat files

rah_cache Packagist | Issues Rah_cache is a simple, experimental full-page caching plugin for Textpattern CMS. It caches Texpattern’s dynamic pages as

Jukka Svahn 2 Apr 24, 2022
Textpattern-for-Panic-Coda - A Textpattern CMS mode for Panic Coda

Textpattern elements for Panic Coda 2 Handy elements for use with Panic Coda 2 on a Mac when authoring files for the Textpattern CMS. This repository

Phil Wareham 8 Jun 26, 2017
PHP Sitemap Generator

This class can be used to generate sitemaps and notify updates to search engines.

Paweł Antczak 146 Nov 15, 2022
Laravelium Sitemap generator for Laravel

Laravelium Sitemap package Laravelium Sitemap generator for Laravel. Notes Dev Branches are for development and are UNSTABLE (use on your own risk)! I

Laravelium 1.3k Dec 30, 2022
Textpattern-installer - Textpattern plugin and theme installer for Composer

Textpattern Installer for Composer Package directory | Issues Install plugins and themes to Textpattern CMS with Composer. $ composer require rah/rah_

Jukka Svahn 7 Apr 14, 2022
PluXml, Moteur de Blog et CMS à l'XML sans base de données

PluXml Créez un site web performant en toute simplicité et sans base de données. Télécharger PluXml 5.8.7 (zip) Version bugfix (5.8.8) en développemen

PluXml 192 Dec 14, 2022
The simplest way to create a dynamic sitemap for your self-coded website which you have made by using PHP/HTML/CSS/Js etc... Scripts.

Sitemap_index.xml The simplest way to create a dynamic sitemap for your self-coded website which you have made by using PHP/HTML/CSS/Js etc... Scripts

Tanish Raj 1 Oct 16, 2021
A PHP sitemap generation tool.

Cartographer A sitemap generation tool for PHP following the Sitemap Protocol v0.9. Cartographer can handle Sitemaps of any size. When generating site

Tackk Inc. 332 Dec 23, 2022
MassPlugCompiler - Textpattern CMS plugin compiler

mtxpc mtxpc compiles Textpattern CMS plugin sources into installer packages. Supports multi-file structure and a JSON manifest file. Install Using Com

Jukka Svahn 5 Apr 15, 2022
Ied plugin composer - Inspired Plugin Composer: Create, publish and edit plugins from within Textpattern CMS.

ied_plugin_composer Create, publish and edit plugins from within Textpattern CMS. Creates a new page under the Extensions tab where you can edit and e

Stef Dawson 8 Oct 3, 2020
Etc cache - Cache plugin for Textpattern CMS

etc_cache Download | Packagist This Textpattern plugin provides an events-driven cache solution for Textpattern CMS. Textpattern is fast, but when you

null 3 Aug 30, 2020
Etc pagination - Pagination plugin for Textpattern CMS.

etc_pagination Download | Packagist This Textpattern plugin creates a paginated navigation bar on listings. It has a wide variety of attributes – so y

null 5 Dec 10, 2020
Smd imagery - A Textpattern CMS plugin for managing images in the Write panel.

smd_imagery Insert images into your Write panel. Very handy for people who run photoblog or image-heavy sites, or those who categorise images for incl

Stef Dawson 5 Nov 15, 2022
Zem contact reborn - An extensible HTML form mailer plugin for Textpattern CMS.

com_connect Contents Introduction Installing and upgrading Migrating from zem_contact_reborn Usage Tags com_connect tag com_connect_text tag com_conne

Textpattern CMS 23 Dec 8, 2021