sitemap

Generates sitemap files.

Inspect the resource graph and generate a sitemap.

Suppported formats are xml, text, json and html.

Enable this plugin for the emit phase when writing the text, xml and json formats. For the html format the transform phase is recommended so the modified sitemap template is optimized.

The xml and text formats are designed to be exposed to robots to indicate how they should crawl the site whilst the json format allows programmatic processing of the sitemap, for example if you wanted to fetch resources after invalidating a CDN cache.

The html format is designed to be exposed to humans on your website to allow them to find relevant information. It generates an unordered list of the sitemap tree hierarchy. When using the html format you should also let us know the template that the list will be injected into using the template option.

The template file must have an HTML AST available and declare an element with an id attribute of sitemap the generated list will be injected as a child of that element. This allows you to work on the sitemap markup and styles as you would work with any other document and ensure the sitemap data is always consistent with the website structure.

new SiteMap(context, options)

Create a SiteMap plugin.

The base URL used to make links absolute first uses the base option otherwise looks for a url configuration option and finally will try to extract the homepage from a package.json file in the current working directory.

When no formats are given the xml format is used.

The name option should not include a file extension.

The rules option allows you to set fields for the xml output based on regular expression test patterns, for example:

{
  rules: [
    {
      test: /docs\//,
      changefreq: 'weekly',
      priority: 0.8
    }
  ]
}

If changefreq is invalid it will be ignored, if priority is outside of the zero to one range it is clamped.

When the image option is set the xml output format will include image resources in documents. The image:loc node is always set to an absolute URL using the src attribute of the img element.

The meta data to add to the sitemap is extracted from attributes on the element so you can declare sitemap meta data in the HTML document. The attribute to XML node name map for img elements:

Image elements in sitemap URLs have a limit of 1000 if the number of images in a page exceeds this limit the sitemap will only include the first 1000.

When the video option is set the xml output format will include video resources in documents. The video:content_loc node is always set to an absolute URL using the src attribute of the video element.

Video sitemap meta data is extracted from the element attributes. The attribute to XML node name map for video elements:

For the data-tag attribute you can separate multiple tags with a comma and they are expanded to multiple video:tag elements in the xml.

If the video element contains child embed elements a video:player_loc xml element is created for each child embed element with a src attribute.

Videos are also limited to 1000 per page.

No validation is performed on the video attributes you should read the corresponding documentation to verify attribute values are correct.

You can pass options specific to a format using the renderer option and the format key, for example:

{
  renderer: {
    html: {
      builder: CustomHtmlBuilder
    }
  }
}

When the robots option is set you should have a robots.txt file being processed and enabled the parse-robots plugin otherwise the option will have no effect. When configured correctly this option adds Sitemap entries for the robots.txt file for the text and xml formats.

If you are creating sitemaps in both text and xml formats two Sitemap entries will be created.

SiteMap.prototype.before(context, options)

Generate the sitemap.

When the exclude option is given each entry should be a regular expression pattern. If a pattern matches an HTML document id in the resource graph it is not included in the sitemap.

Default implementation for generating a DOM structure of the sitemap within the sitemap template file.

It is recommended that you use this default implementation and style the lists but if you really want to use different elements for the sitemap you can supply an alternative builder class as a renderer option.

new HtmlBuilder(context, sitemap, ast, template, options)

Create an HtmlBuilder.

Returns a string href.

HtmlBuilder.prototype.getHref(node)

Get an href attribute value.

Returns a string href.

HtmlBuilder.prototype.getTitle(node)

Get the title for a node.

This value is used for the link text and the link title attribute.

Returns a string title.

HtmlBuilder.prototype.getLinkText(node)

Get the text for a link node.

Prefers a title when available otherwise uses the page name.

Returns a string for the link text.

HtmlBuilder.prototype.getDescription(node)

Get the description for a node.

Extract the content attribute from a meta element with name set to description.

Returns a string description.

HtmlBuilder.prototype.getRootElement(node)

Create the root element to append as a child of the element with an id of sitemap in the template file.

This implementation returns a ul element.

Returns an element.

HtmlBuilder.prototype.getItemElement(node)

Get an element for each tree node item.

This implementation returns a li element.

Returns an element.

onEnter(node)

Invoked when a node is entered.

onExit(node)

Invoked when a node is exited.

HtmlBuilder.prototype.build(parent)

Main DOM builder function, generates unordered lists representing the sitemap.

Renders the sitemap as plain text.

static render(context, sitemap)

Render the plain text format.

Returns an object with the text file content.

static extension

Get the file extension for the text format.

Renders the sitemap as HTML.

static render(context, sitemap, options)

Render the HTML format.

Unlike other formats this renderer does not return a content string as it modifies the template AST and marks the AST as dirty before updating the file content.

When no builder option is given the default HtmlBuilder class is used.

When the strategy option is given it should be one of root, absolute or relative. The default strategy root builds links with a leading slash, the absolute strategy uses the base URL to make links include the domain name and the relative strategy resolves links relative to the sitemap template file.

If an unsupported strategy is given the default is used.

Returns an object with the sitemap ast.

Renders the sitemap as JSON and generates an AST of the sitemap structure.

static extension

Get the file extension for the json format.

Renders the sitemap as XML.

static render(context, sitemap)

Render the XML format.

Returns an object with the xml file content.

static extension

Get the file extension for the xml format.