Back to all posts

Your Website Content Migration Checklist: 3 Crucial Steps

Posted by Zsolt Nagy on May 30, 2019

What is Content Migration?

How To Tackle Content Migration

The Content Migration Checklist

How To Test Your Content Migration

Using Portable Formats

Data Formats Used with Content Migration

Have you ever had the responsibility of migrating content from one CMS to another? Most of us, myself included, have encountered a non-obvious step during the migration process running the risk of losing some benefits associated with the original content, regardless of the method of migration.

I still remember my first attempt at migrating my hobby blog after purchasing the domain and installing a CMS. Even though I used one of the easiest possible migration methods, problems started arising. Syntax highlighting did not work for source code. Disqus comments did not work. Some internal references stopped working due to using absolute URLs instead of relative ones. It was a mess.

This article is my attempt at guiding you in your quest to successfully migrate your content.

What is Content Migration?

Content migration is the act of moving content and metadata associated with it from a source where the content exists, to a destination. It is worth emphasizing metadata because there are many connections between the original content and its surroundings.

Common Challenges with Content Migration

The easiest and most common form of migration is the replacement of a content management system (CMS) without changing the domain and all internal URLs remain intact. Suppose you read the article comparing Wordpress and ButterCMS, and decide to migrate your Wordpress site to ButterCMS. This is typically a fast and easy process, as you keep all the structure, tags, categories, your user base, and everything else intact. Keep in mind, technical debt issues with your Wordpress blog may pose challenges requiring custom modifications, but overall this process is still the most straightforward.

Technical Debt

One easy example of Wordpress tech debt is the use of absolute URLs in the content area linking to content within the same domain.  Once you change the domain, you have to locate these links one by one. The same holds for subscription forms generated by plugins such as Ninja Popups. It is tempting to place a [ninja-inline=ID] tag inside the blog post to display a subscription form, but you have to update these declarations one by one when you migrate your content.

Changing Domains

When migrating a blog from one domain to another, you may wonder whether it makes sense to keep the original content or at least the original domain. There are several benefits and drawbacks.

On the benefits side, it is obvious that external links will point at the original content which comes with some level of benefits because we can not only monitor traffic using analytics tools, but we can also choose to display the original post or redirect the user to our new blog. Another thing you can do is tell Google to give the SEO benefits to the migrated content because it is the same.

On the drawbacks side, the old source has to be managed and financed. If you chose to duplicate content and not just make a URL redirection, some users may stick to the original site, which implies that they won’t see your updates or your new offers.

How to Tackle Content Migration

There are three steps in every successful content migration process that needs to be taken care of:

  1. Planning the migration
  2. Executing the migration
  3. Verification and monitoring

In step 1, our objective is to properly plan what needs to be migrated, how the migration will be performed, and how the success of the migration will be verified in the form of acceptance criteria and performance metrics. This includes the identification of metadata associated with the content, links, domains, backups, performance, hosting, staging and production environment, testing plan, and best effort to fall back to alternative content in case migration was not 100% successful.

Step 2 is all about sticking to the plan and executing it. The first thing we have to do in step 2 is a backup just in case something goes wrong. It is also worth making sure that our site is not crawl-able during the migration process. Then we perform the carefully planned steps of the content migration.

Finally, in step 3, we verify the results of the migration by going through the previously determined acceptance criteria. This step also includes continuous monitoring of metrics associated with the site, including search engine rankings.

The Content Migration Checklist

The content migration checklist follows the three crucial steps. First, we plan the migration and formulate acceptance criteria:

  1. Define the goals of the migration
  2. Back up the original site
  3. Check deployment and rollback of your staging and production environment
  4. Prepare analytics and tracking related metadata for the new site
  5. Complete list of URLs and internal links
  6. Links referring to other articles from the same domain (absolute and relative)
  7. Asset files and references
  8. CMS-specific plugins that affect the content
  9. Review custom styling, content annotation
  10. Interactive content such as an interactive text editor or a calendar
  11. Popups, signup forms
  12. Migration plan of comments associated with the post
  13. Banner spaces that may or may not be migrated
  14. Take character encoding differences into consideration
  15. Original content HTML tags
  16. Categories and tags associated with the content
  17. SEO keywords associated with the content
  18. Table of contents when migrating a series
  19. Author information
  20. User data if there are registered users

Never migrate on production first. It is always a good idea to have a development environment or a preview before you publish a migrated content, otherwise, many unwanted errors could go live.

If you do have a test site, you can verify most of the points of the above checklist.

Then comes the second step, execution of the migration:

  1. Execute and verify the backup
  2. Disable crawlability of the site
  3. Execute a test migration on a staging environment and verify all points of your acceptance criteria on it
  4. In case of a domain change, update DNS and perform all accompanying administration and automatic redirection
  5. Perform the real migration of entire content inventory
  6. Enable the new site, restore crawlability

The last step is about verification and monitoring.

  1. Go through your acceptance criteria
  2. Manually test your content in the unlikely case you overlooked something
  3. Check the performance of the new site, measure loading times
  4. In the case of different domains, retire the old content or mark the original content source to avoid getting penalized for duplicate web content
  5. In the long run, track ranking and indexing

The list is not exhaustive. It is important to note that your content migration checklist may be different than mine. The purpose of a checklist is to cover most of the common cases and to prevent you from making the same mistake twice. This is why adding items to your checklist based on your past mistakes has paramount importance.

Checklists may also differ from migration to migration. In each case, two lists of items have to be determined, ideally, during the first step of the process (the planning step):

  1. What is to be migrated?
  2. What is not to be migrated?

Having a whitelist and a blacklist at the same time is counter-intuitive because, in theory, everything that is not in a white list should be on the blacklist. However, practice trumps theory, because maintaining an explicit blacklist serves as a reminder to avoid considering whether elements of the blacklist should be whitelisted or not.

For instance, when converting blog posts to a book format, it makes perfect sense to blacklist migrating all comments associated with the posts. It also makes perfect sense to whitelist the code articles as chapters, the sections inside the chapters, the code examples, etc.

In the planning phase, we mentioned that we need to come up with acceptance criteria. We will now continue with the elaboration of what could be placed in the acceptance criteria.

Get more content management tips for better SEO by subscribing to our monthly newsletter.
    

How to Test Your Content Migration

undefined

As in most areas of software development, defining acceptance criteria before performing the actual migration helps you control the quality of the process.

Although the criteria, same as the checklist before, might be unique for each case, here are some examples of common points to consider:

  • Completeness. Go through the content inventory and check each section. Make sure everything you whitelisted got migrated.
  • Quality scan. Scroll through the migrated content and identify anything that stands out. Mark these errors and correct them before going live. Include aesthetic errors and fix them before the SEO check in a later step.
  • No broken or unwanted links. Once the migration is completed, display the content, and click each link in the content. Manually check if the result that appears is desirable. There are four possible outcomes.
    • The only acceptable outcome leading to no further tasks on your end is if the click leads to the content you wanted to link to.
    • The second best outcome is if the click leads to a broken reference because you are referencing content that has not been migrated. Each broken link has to be noted and corrected before the results of the migration goes live.
    • here is a third outcome that may be acceptable in the short run, assuming that it will be fixed eventually. You may find that some links point at the source of the content.  This is acceptable as long as you make a note of these TO-DO items, and perform the migration at a later stage.
    • The fourth possibility is if the link is completely broken or unwanted, and this error cannot be fixed by migrating other content using the same process. These errors have to be fixed before going live.
  • No unwanted characters. Unwanted characters can come from character encoding errors, or alternative dialects only understood by the source CMS. For instance, it was an embarrassing moment for me to conclude that in an earlier version of my JavaScript book, there was a WordPress tag describing an inline signup form of my Ninja Popups plugin. The Leanpub software creating my book from the markdown source displayed the [ninja-inline id=1] tag as plain text.
  • Acceptable syntax highlighting. In case you are displaying source code, make sure your syntax highlighter works as expected. If, for example, you are using TypeScript or JSX, the regular JavaScript highlighter won’t work for you. This step requires domain-specific expertise.
  • Legal Considerations. In the European Union, GDPR compliance is a hot topic, because companies may lose up to 4% of their revenue by violating data protection regulations. During the migration process, you may have to check if you are using data for any purpose that violates compliance.  You also have to make sure that you ask for the consent of the user for your cookies, as well as asking for opt-in consent for email signups.
  • No SEO penalty for crawlable content. This step requires some SEO expertise. If you are duplicating content, make sure you visit the source and check if there is a canonical link in the head tag. For instance, in all blog posts of zsolt-nagy.github.io, a canonical link can be found.

On this page:

http://zsolt-nagy.github.io/Handling-Time-in-Javascript-Inspired-by-the-Bug-Report-of-the-Century/, the canonical link is:

<link rel="canonical" href="http://www.zsoltnagy.eu/Handling-Time-in-Javascript-Inspired-by-the-Bug-Report-of-the-Century/">

Make sure the result is structured in an SEO-friendly way. The more structural changes made between the original content and HTML, the more you need to pay attention to this step.

  • Go through your migration whitelist and identify all metadata and interactive functionality that was not migrated. This is an obvious but time-consuming step in case of popular content. Address these steps one by one based on your priorities.

Content migration may be a time-consuming process. If you are doing this for the first time, you have to have a thorough checklist. You can let go of some points by setting up your environment in such a way that some points will be taken care of automatically. For instance, you don’t have to worry about character encoding on each site if you know that you have already migrated a hundred pages.

Using Portable Formats

The single most important precondition for performing content migration is the use of portable formats. Although it is very easy to agree with this statement, we can go a bit further by defining what counts as a portable format from the perspective of migrating content.

Automate with Portable Formats

To me, a format is portable if content can be automatically ported from the source to the destination by automatically ensuring that most of the items defined in the first three steps do not have to be checked manually.

Google Analytics data is not in a portable format, because we do not port traffic from the source URL to the destination. HTML may or may not be a portable format from the perspective of your migration. Portability depends on if the data meets your quality standards.

Your content may still be considered portable even if you are required to use a converter from one format to another during migration. i.e. using a Markdown to HTML converter. With a clean resulting HTML structure, the content can be used without further editing.  Besides, it is always important to maintain the format you consider portable.

A web scraper may automate some aspects of content migration. For two years I had to scrape some financial data from some Australian sites once a day because we had no API access to some data. The scraper script ensured that our data was up to date, and there was no further need for an API as an intermediate format.

CMS Exports

CMS exports provide a fast opportunity for content migration. Your content may be exported, and as part of the process, CMS plugin-specific notation might be removed. In this section, we will consider ButterCMS, Wordpress, Joomla, and Drupal content exporters.

When it comes to CMS exporters, there is a difference between

  • exporting the contents of the whole CMS and importing it into the new CMS platform,
  • exporting a subset of the articles and the associated metadata.

The unit price of the former export gets significantly cheaper as the number of articles increases because the content can be viewed as an integral unit.

Most CMS solutions have an export feature to export their database content, while others have import features. The process is as follows:

  1. Export the contents of the database into an intermediate format such as SQL or JSON
  2. Transform the format into the format of the other CMS (if needed)
  3. Populate the destination CMS with the transformed data

There are some tools such as https://cms2cms.com/ that automate the content migration process for you in exchange for a low price tag. However, executing an automated migration process does not guarantee that you are done with the migration, because small errors may appear.

When it comes to migrating to https://buttercms.com/, check out the article  https://buttercms.com/blog/buttercms-vs-wordpress-why-choose-butter, as the article defines how easy it is to export from Wordpress to ButterCMS.

Data Formats Used with Content Migration

Choosing the right data formats can drastically increase or decrease the amount of time needed to migrate content.  Markdown, HTML, JSON, XML, and rendered HTML are common formats that work with varying effort.

Some formats are easier to migrate than others. Also, note that the most popular content management systems have an export feature that helps you export your content in multiple formats.

Markdown

Markdown is arguably the most convenient format to use when migrating content. Most CMS solutions offer markdown as a format to describe your content. Even if your CMS does not support Markdown, you would still be able to save the Markdown source and convert it to a format your CMS accepts using Markdown converters (i.e. Markdown-to-HTML plugin of Atom.io). This is the one step that pays off the most when it comes to migrating content, because Markdown is human readable, compact, and it also defines most elements of your content.

This language comes with dialects though. One I regularly use when writing my books is Markua. Your CMS may also have some special extra content in your Markdown file. For instance, in my Wordpress sites, there is a special notation for plugins such as a signup form. Using Ninja Popups, this notation is [ninja-inline id=”1”], where 1 is the identifier of the online form. This content is understood by the CMS. However, when repurposing the content, in the form of a Leanpub book, the above content loses its meaning. The same holds for migrating from one CMS to another CMS platform where our plugins are not available.

The less extra content you have concerning the generic Markdown definition the better. Each extra content has to be migrated and this may be a tedious manual process for you.

HTML

Markdown can be directly converted to HTML. Any HTML content can be placed inside a Markdown file. Check out the online conversion tool https://markdowntohtml.com/ to see a visual conversion between formats. Markdown to HTML conversion is an easy task, therefore, it would be surprising if your CMS didn’t support it.

If you migrate content to a website, the result of your migration will be displayed in HTML. Therefore, HTML gives you the highest degree of control, at the expense of readability for Markdown. Because of this, HTML can be used as a portable format, giving you the benefit of migrating content in the same format as the rendered version of itself.

The problem with this migration type is that it is hard if not impossible to separate what comes from rendering the original content and what is code generated by the CMS such as headers, footers, banners. Avoid this migration option whenever possible.

JSON/XML

It is also possible to define content using XML (eXtensible Markup Language) or JSON (JavaScript Object Notation). JSON to XML is like Markdown to HTML. JSON is more compact than XML, it is more intuitive to read, and therefore, it has become the most widely used way to describe payloads of API endpoints.

Human readable JSON:

{

   "name": "Zsolt Nagy",

   "websites": [ "zsoltnagy.eu", "devcareermastery.com" ]

}

XML:

<?xml version="1.0" encoding="UTF-8"?>

<root>

   <name>Zsolt Nagy</name>

   <websites>

       <element>zsoltnagy.eu</element>

       <element>devcareermastery.com</element>

   </websites>

</root>

XML is significantly more verbose, and it does not offer many benefits in this particular context, as it represents the same data as the JSON string.

Content as a Service

Similarly, it is also possible to create your content or purchase a content provider service. In this article, I will call this content as a service. Content as a service is all about querying content from an API and presenting it in your CMS.

If you create your content, the original content is stored in a version-controlled way, either in a git repository or in a database. Content export can happen both on an API level and a database level. For instance, when exporting from Drupal or Joomla, the most common way of exporting content is accessing the database of the CMS and extracting the web content using a plugin.

Throughout my career, the worst migration type I have ever seen was the “save as HTML” feature of Microsoft Word. Around eight years ago, I worked with a client whose content was in a custom CMS. This client could only use Microsoft Word to create HTML content, and unfortunately, Word was very verbose in terms of HTML tags created. This HTML is a horrible format when it comes to defining SEO friendly content, and the maintainability of the content was not too good.

How Much Does Content Migration Typically Cost?

Content migration costs vary. The cost of migrating content from a non-portable format may easily exceed hiring writers to create brand new content on the same topics. To reduce the cost of content migration, the largest ROI is on designing your content to be portable. If you achieve that, your migration costs drop to a negligible two to three digit amount in terms of migration tools and manual labor costs.

If your content is not portable, worst case, it may pay off to rewrite your content instead of addressing case-by-case maintainability issues. The alternative is to request help from sources like PeoplePerHour, UpWork, or TopTal. The hourly rate of developers varies between $30 to $100, and the amount of hours you need depends on the complexity of your problems, possibly adding up to a total of several thousand dollars.

If your content is portable, a simple migration service like https://cms2cms.com/ can give you an estimation of the costs for automated migration. Costs depend on the amount of content you have and your special needs, and they typically don’t exceed $1.000. A simple blog may cost a few hundred dollars to migrate.

Conclusion: Keep Your Content Portable

The quality of content migration depends on the quality of the migration process. Developing strict requirements that define the acceptance criteria of migration and what needs to be migrated is essential. To encourage fast and smooth migration, make sure you use a portable format to write your content and the metadata associated with it. If the format is not portable, you may face high costs to clean up the content.  Those potential costs may even support the creation of new content all together.

Define the elements that need to be migrated so that you will be able to check them one by one after the migration. Also, define the acceptance criteria for your migration in advance so that you can have access to a checklist before going live.

Considering the per article unit price, migrating the complete content of a CMS to another CMS is a significantly cheaper task than migrating a subset them. Regarding the option, you may use converters and CMS exporters to make your content portable. Remember that you have to maintain the source when using a portable format, instead of editing the converted results. Make sure you automate everything that you can.

If you get stuck with the migration process, you may seek external help. When calculating costs, make sure you factor in the skill level of the freelancer and the associated costs of recruitment. Make sure you define your tasks well, and you spend money on tasks that increase the overall quality of your content migration process so that you can either operate it with your team, or you can outsource the cheaper and repetitive work at a later stage.

In a nutshell, you could say that the most important observation when migrating content is that the use of portable formats coupled with well-defined checklists describing your acceptance criteria saves most of the costs associated with the process.

Make sure you receive the freshest Butter product updates.
    

Related Articles