How to Find Thousands of Potential EDU Backlinks with GNU/Linux

Share This

If you use GNU/Linux or Mac OS/X (or even Cygwin on Windows) you can use a little one-line script to quickly build lists of thousands of possible backlinks. It can be used to build lists on any kind of Google query, but this one specifically finds pages on EDU sites.

The script below visits a page of Google SERPs with a text browser and then formats the data for you. The example below uses the Links Browser, but you can also use Lynx.

links -dump 'http://www.google.com/search?q=site%3Aedu+keyword&num=100&start=0' | egrep -o "http:.*" | grep -v "google.com" | egrep -v '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | awk '{ print "<li><a href=\""$1"\">"$1"</a></li>" }' >> backlink_hunter.html

The output is an HTML page with a bulleted list of links. You can then use Firefox to middle-click on each link and quickly scan them to see if they might be a candidate for a backlink request. Useful Firefox keyboard shortcuts for this are Ctrl-w (close tab), Ctrl-Tab and Ctrl-Shift-Tab (move between tabs). Here is a screenshot of part of the output:

EDU Backlinks

A quick explanation of the script:

  1. links -dump (or lynx -dump) tells your text browser to dump the contents of the page into the terminal. Both text browsers create a list of URLs from the page at the bottom of the output. We are dumping a Google query for your desired keyword, with 100 results per page, starting at the beginning (&start=0). You can run the script multiple times, increasing the start parameter each time (for example: 101, 201, 301, up to 901). 100 results is probably enough to start with though.
  2. egrep -o searches each line for text starting with http:. The -o option means to only return the matching portion of the line — in this case, just the URL.
  3. grep with the -v option removes lines that match the regular expression — in this case removing any URL that interally links to Google.
  4. The next grep -v line removes all URLs that match an IP address because Google links back to itself in the SERPs with an IP address. This will leave us with only outbound links from the SERPs.
  5. awk takes each remaining URL and encloses them in <li> tags to format them in a list.
  6. The final command of the script — >> backlink_hunter.html — appends the output the the file backlink_hunter.html. If the file doesn't exist, it will be created. If it does exist, it will just append the output to the end of the file. That way you can change the &start= parameter in the Google query and append the next set of results to the file, creating list of up to 1000 URLs.

I recommend experimenting with different Google operators and seeing what happens. Using this one-liner is much faster than some other methods of hunting for backlinks. The script can be expanded to do more — this is just a basic introduction to the concept.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
Close
E-mail It