Managing a complex SEO migration of an outdated site
Recently, I was faced with a rather daunting SEO challenge: the migration of a very outdated website built with a little-known CMS called Zope. The previous developer, unfortunately, was not available to provide us with the database for data extraction, which made the process even more complex, since the site contained tens of thousands of pages.
The scrape and import strategy
Given the lack of cooperation in accessing the data, after consulting with the client, I had to resort to good old-fashioned scraping methodology to retrieve essential information from the site.
For this work, I deemed the Screaming Frog spider as an appropriate tool. This tool, usually used to do error scans of a site, has very powerful features for extracting data from a website as well.
The feature in question is called Custom Extraction.
Screaming Frog’s “Custom Extraction” is an advanced feature of the SEO Spider that allows you to extract specific data from a web page while scanning the site. This can include any HTML elements, attributes or even inline scripts using XPath expressions, CSS Path or Regex (regular expressions).
The data I needed were:
- Previous URL
- Title
- H1
- Post content.
All crucial elements to preserve SEO value during migration.
This process took several days and brought up hundreds of thousands of URLs. Too many! I knew there would be much less useful content.
I did an initial skimming of URLs that I was definitely not interested in-there were thousands and thousands of search pages that were linked but not indexed.
I then took other URL patterns that I didn’t know might or might not bring traffic and entered them into SEOZoom’s “URL Analysis” tool.