I recently had to take a copy of a client’s website before they transferred from another provider. It was running an old copy of Joomla, and getting backend access proved difficult. So we opted to grab a static copy of the site and keep that live until we had their new WordPress website ready.
There are plenty of apps out there that will download whole websites for you, but the simplest way is to use wget. If you don’t have a copy, you can install wget on a Mac without using MacPorts or HomeBrew using this guide from OS X Daily.
Once it’s installed, open Terminal and type:
You’ll see there are a ton of options. At it’s simplest, you can just type:
That will download a copy of the index page of example.com to whichever directory you’re calling wget from in Terminal. But I wanted to get a copy of the whole website, and have it to work locally, i.e. using root-relative URLs, rather than referring back to example.com live on the web.
So here’s the code:
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --random-wait --domains example.com --no-parent www.example.com
Let’s step through the options used:
Recrusively download the directories, up to a max of 5 deep.
Can also use “-nc”. Stops the same files on a server being downloaded more than once.
Causes Wget to download all the files that are necessary to properly display a given HTML page. Including such things as inlined images, sounds, and referenced stylesheets.
Renames HTML files as .html. Handy for converting PHP-based sites, such as the Joomla one I needed to copy.
After the download is complete, convert the links in the document to make them suitable for local viewing.
Escapes characters to make them safe on your local system.
Don’t act like we’re downloading the whole site…
The domain you want to download the whole site from.
Do not ever ascend to the parent directory when retrieving recursively.
After all that you’re left with a folder that should be a complete copy of the domain you’ve targeted. Very handy.
However, typing all that is a bit of a pain. I think a bash script taking the domain as an input would save the pain of typing all that out, maybe even wrap it up into an app using Appify. Hmm, one for the to-do list.