How to Find Unlisted Web Pages

by Carlos Mano

Like an iceberg, most of the Web is hidden. Popular search engine spiders search out websites that have a lot of links to them. There are some great sites that do not have a lot of links to them, but you will never find them with the popular search engines.

Learn how a search engine locates a Web page. If a URL ends in .html or .htm, the search engine looks for a Web page with that name, and if it does not find it, it delivers the 401 message. If the URL ends with a directory name (no dot), or a slash, the search engine looks for the default, which is index.html. If it's found, it is opened, but if it's not found, the files in the directory are listed and you can usually open any of them. Start with the ones that end in html.

Learn to back up the path name. When you are at a Web page and the URL that is displayed is http://abc/def/ghi/xyz/, you are looking at a Web page in directory xyz, which is in directory ghi, which is in directory def, etc. There are often interesting Web pages in some of these parent directories, so take a look in http://abc/def/ghi/ or http://abc/def/ or http://abc.

Follow the links from libraries. They often have the best links into the deep Web. Don't forget university libraries and government libraries.

Check out links from "hidden web" or "deep web" or "invisible net" references. This is getting to be a popular Internet activity and there are even groups devoted to finding these hidden gems.Two of these are StumbeledUpon and OAIster (pronounced: Oyster; motto: find the pearls).

Tip

  • The more you know about search engines, the more you know how to find the pages that they miss. There are hundreds of search engines besides the ones everyoneknows about. They all use different search algorithms and they all find different Web pages. Some megasites will run your searches on hundreds of search engines at the same time. Search for these megasites and they will help you find the invisible net.

Warning

  • Just as you are looking at the hidden net, spiders, hackers and lurkers are looking at your files. You might want to take some basic precautions like putting blank files named index.html in directories with sensitive files.

Photo Credits