The Internet Archive, commonly known as the Wayback Machine allows users to visit archived versions of websites. The Internet Archive has been archiving sites since 1996 and has 514 billion archived web pages!
If you are wondering how you can use the Internet Archive in your OSINT research, you’ve come to the right place. There are many methods to extract important information from the Wayback Machine to further your OSINT investigations. If you are looking to see historical versions of a website due to the site being deleted or replaced with new content, the Wayback Machine can help. You may need to verify that a target previously worked at a company but the current state of the site does not have the target’s information there. Sometimes a target may intentionally hide information from their present website, looking at older dates of the site may reveal new information. Sometimes you can gather relevant data like names, phone numbers, email addresses, and even metadata from older versions of a website. Let’s explore search methods…
Quick Search Methods:
- The quickest method to see all the files archived on a particular site are by visiting the URL https://web.archive.org/*/www.example.com and replacing http://www.example.com with the site of your interest. Example: https://web.archive.org/web/*/www.osinttechniques.com
If the site has been archived, a calendar view will appear with colour coded dots which have different meanings. The blue dots are what you’ll want to click on as they indicate a capture of the web page. Green indicates a redirect, orange dots indicate the crawler received a client error and red means there was a server error. Navigating the timeline will display the dates of when the site was archived.
- If you want to view all the archives of a particular domain, use the link https://web.archive.org/*/www.example.com/* and replace http://www.example.com with the site of your interest. As noted below, you can see that 117 URLS were captured for www.osinttechiques.com. Example: https://web.archive.org/web/*/www.osinttechniques.com/*
Other Search Methods:
- When you have a URL of interest, you can search here https://archive.org/web.
Example: search www.myspace.com to see how the site has changed over time.
- Conduct keyword searches here https://web.archive.org
Example: search for “osama bin laden” to see what results are revealed or search for social media users such as the Facebook profile of Mark Zuckerberg. https://web.archive.org/web/*/www.facebook.com/zuck
- Use the advanced search feature here https://archive.org or by directly visiting https://archive.org/advancedsearch.php to perform more targeted searches and sometimes find the email address associated with a user who uploaded a file.
Some files require you to login to gain access, this is where you create a fake research account to investigate further https://archive.org/account/signup
- Use the steps below to understand how to find the email address associated with uploaded files. For OSINT research if you identify an email address, it’s another point you can leverage and search that email address in other places such as search engines or social media sites.
- Scroll below to find “download options”
- Click on “show all” to display all files.
- Click on the file that ends with “meta.xml”
- Ctrl+f for the word “uploader” and you will see the email address: [email protected]
Use Collections and Changes (beta):
- Collections are a way to learn why a URL has been archived into the Wayback Machine.
- Changes allows users to select 2 different versions of a URL & compare them side by side.
Learn more about Collections and Changes here: https://blog.archive.org/2019/10/18/the-wayback-machine-fighting-digital-extinction-in-new-ways
- Use https://archive.org/web/ to request that a page be archived, the save button is visible at the bottom right of the screen or by going directly to https://web.archive.org/save. This “Save Page Now” option only captures that particular page and not the entire website and only works for sites that allow crawlers. The screenshot below shows an article from OSINT Curious saving to the archive.
For sourcing purposes it may be important to understand when something was saved by the Internet Archive. Let’s look at the link below:
The format of the numbers in the middle are yyyymmddhhmmss so the date the site was crawled was February 14, 2018 at 03:43 and 36 seconds.
What if the site you are investigating isn’t on the Internet Archive? Some sites will not be on the Archive.org due to robots.txt files or because a website owner has requested their site not be archived.
However, you have other search options such as searching for cache content as mentioned in this blog post https://osintcurio.us/2019/02/12/osint-on-deleted-content or check other online archives such as archive.today.