Generating Sitemaps with Spring Boot & SitemapGen4j

· 5 minute read
Magnifying Glass

A sitemap is a useful SEO tool that helps to inform search engines & web crawlers that certain pages exist on your website. The most common form of a sitemap is an XML file that sits on your webserver that may be submitted to search engines through their site administration portal, such as the Google Search Console. A more in-depth explanation of a sitemap can be found here.

This post demonstrates generating a sitemap using SitemapGen4j and serving it on a website backed by Spring Boot.

Want to skip the tutorial? Check out the full source code for this project on GitHub.

Modeling Our Site

In this post we will use a blogging platform as an example website for which we wish to generate a sitemap. We can begin to model a simple blog entry as a POJO:

public final class BlogEntry {
	private final String title;
	private final String content;
	private final LocalDate date;

	// constructor and accessors omitted
}

Blog entries themselves will typically be stored in either a database or on a file system, with the mechanism for accessing & retrieving these entries being referred to as a repository. Spring provides a @Repository annotation that stereotypes the specified component as being a mechanism for encapsulating storage, retrieval, and search behavior which emulates a collection of objects.

Our repository implementation will therefore be annotated with @Repository and should implement the necessary business logic for querying and storing our blog entries. As the sitemap will contain a list of URLs to our blog entries, the repository must facilitate querying all of the entries and returning them as a list.

For this demonstration we will be simply returning a pre-populated list of blog entries, however the findAll() method is where you would typically perform an actual look-up, e.g. a database query or file system traversal.

@Repository
public final class BlogRepository {
	private final List<BlogEntry> blogEntries = new ArrayList<>();

	public BlogRepository() {
		save(new BlogEntry("an-old-blog-post", "Some content...", LocalDate.of(2014, Month.DECEMBER, 1)));
		save(new BlogEntry("a-newer-blog-post", "Some content...", LocalDate.of(2016, Month.MAY, 22)));
	}

	public void save(BlogEntry entry) {
		blogEntries.add(entry);
	}

	public List<BlogEntry> findAll() {
		return Collections.unmodifiableList(blogEntries);
	}
}

Generating the Sitemap

SitemapGen4j is a Java library for generating XML sitemaps and provides functionality for populating a sitemap with modification dates, change frequencies, and individual page importance. We will be using it in this demonstration to simply populate a list of links in the sitemap, however you may extend further and add functionality for providing last modified dates for your blog entries, or ranking entries by importance.

The implementation of our sitemap generation fits the @Service component stereotype and should simply pass the data from our repository to SitemapGen4j, therefore having no encapsulated state. As the sitemap generation service requires access to the repository of blog entries we can utilise Spring’s dependency injection. With the help of JSR330 we can annotate the service’s constructor with the @Inject annotation and have its dependencies resolved by Spring’s dependency container.

@Service
public final class SitemapService {
	private static final String BASE_URL = "https://example.com"; (1)

	private final BlogRepository blogRepository;

	@Inject
	public SitemapService(BlogRepository blogRepository) {
		this.blogRepository = Objects.requireNonNull(blogRepository);
	}

	public String createSitemap() throws MalformedURLException {
		WebSitemapGenerator sitemap = new WebSitemapGenerator(BASE_URL);

		for (BlogEntry entry : blogRepository.findAll()) {
			sitemap.addUrl(BASE_URL + "/blog/" + entry.getTitle());
		}

		return String.join("", sitemap.writeAsStrings()); (2)
	}
}
  1. The BASE_URL should be replaced with your own domain, or ideally passed in as an application property via application.properties.
  2. WebSitemapGenerator returns us a list of strings which we join together to form a single string.

Rendering the Sitemap

Now that we have a service that can provide us with an XML sitemap of our application’s model we must provide a way to render it to those who request it, whether it be a user accessing it via their browser or a headless web crawler accessing the resource directly. In the MVC pattern this component is referred to as the view, and is represented in the Spring framework with the View interface.

Simply put, our view implementation needs to invoke the service to generate a sitemap and write the result of its generation (the XML file) to the HTTP response. As the view is dependent on access to the generation service we can utilise Spring’s dependency injection again in a similar fashion:

@Component
public final class SitemapView extends AbstractView {
	private final SitemapService service;

	@Inject
	public SitemapView(SitemapService service) {
		this.service = Objects.requireNonNull(service);
	}

	@Override
	protected void renderMergedOutputModel(Map<String, Object> model, HttpServletRequest request, HttpServletResponse response) throws IOException {
		response.setContentType(MediaType.APPLICATION_XML_VALUE); (1)

		try (Writer writer = response.getWriter()) {
			writer.append(service.createSitemap()); (2)
		}
	}
}
  1. Don’t forget to set the response content type to application/xml.
  2. Here we generate the sitemap and add it to the HttpServletResponse.

Providing Access to the Sitemap

With the components to generate and render our repository of blog entries in place, we must now allow the world to access the resource that we have produced. We can provide a mapping to the /sitemap.xml endpoint within a @Controller that is composed solely of our SitemapView:

@Controller
public final class SitemapController {
	private final SitemapView view;

	@Inject
	public SitemapController(SitemapView view) {
		this.view = Objects.requireNonNull(view);
	}

	@RequestMapping(path = "/sitemap.xml", produces = APPLICATION_XML_VALUE)
	public SitemapView create() {
		return view;
	}
}

Result

When visiting http://localhost:8080/sitemap.xml in a browser we should now be met with the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" >
  <url>
    <loc>https://example.com/blog/an-old-blog-post</loc>
  </url>
  <url>
    <loc>https://example.com/blog/a-newer-blog-post</loc>
  </url>
</urlset>

As your site grows in the amount of pages it indexes it may be a good idea to start caching this result as the operation to construct the sitemap may become more resource intensive (as currently we are constructing it every time it is requested). A guide to caching with Spring can be found here.

You may now inform search engines of the existence of your sitemap to assist them in indexing your website. A simple way to do this without submitting your site via control panels is to add the following to your robots.txt:

Sitemap: https://example.com/sitemap.xml