The Math Factor Podcast was wondering what was the lowest number that Google did not archive.
I figured, a Powershell script would be fun to make to solve this problem. Here is what I wrote so far:
clear-variable -name doc
clear-variable -name link
clear-variable -name anchors
clear-variable -name ie
clear-variable -name Notfound
$ie = new-object -com "InternetExplorer.Application"
$Notfound = $true
$i = 1000
$ie.navigate("http://www.google.com/search?hl=en&q=" + $i + "&btnG=Search")
Start-Sleep –s 3
while($Notfound)
{
$doc = $ie.document
$anchors = $doc.getElementsByTagName("a")
"Searching number: " + $i
$j = $i
foreach ($link in $anchors)
{
if($link.href)
{
$link.href
if($link.href.StartsWith("http://www.google.com/swr?q="))
{
$i = $i + 1
$ie.navigate("http://www.google.com/search?hl=en&q=" + $i + "&btnG=Search")
Start-Sleep –s 3
clear-variable -name doc
clear-variable -name link
clear-variable -name anchors
}
}
}
if($j -eq $i)
{
$Notfound = $false
"Number not found: " + $i
}
}
I ran into the problem of my program running to fast, which is why I had to add “Start-Sleep –s 3”. This makes the program wait 3 seconds for google to reply.
The problem with a time delay is, lets assume the number is 1 Million. It would basically take me 34 days to reach the number.
I guess if I has a independent server, I could solve this and let it run for a year.