How to Turn Proxy Hijacking Into Inbound Links to Your Web Site

Share This

Proxy hijacking is a problem that has been in the SEO news lately. Dan Thies has a good post about it on SEO Fast Start.

Basically, proxy hijacking occurs when search engines index your Web pages through someone else’s proxy. Here is an example from a proxy site that has indexed a duplicate of wunderground.com’s about page:

Proxy Hijacking oddproxy.com and wunderground.com

Dan Thies mentioned a few solutions to the problem. One is

…a PHP script that implements the “reverse cloaking” defense, putting a “nonindex, nofollow” robots meta tag into your pages unless it’s a spider that you have configured the script to recognize.

That solution may remove your proxied page from Google index (unless the proxy strips your <head> element), but it doesn’t provide any benefit for your site.

Another alternative is to turn that indexed proxy page from a dangerous duplicate of your Web page into an inbound link.

Getting the IBL

To find the IP address of the proxy, use the Show IP Firefox Extension. It will show the IP address(es) of the Web page that is in your browser.

Show IP Firefox Extension

The IP of the proxy is usually the same as the IP that grabs your content. If not, it’s probably on the same C-block of IP addresses, and you can grep it out of your logs (see below).

The first example assumes that the proxy is grabbing your content from the same IP address that is seen with the Show IP Firefox Extension. This PHP code has not been fully tested, but is just intended to be an example of the technique.

<?php
$ip = $_SERVER['REMOTE_ADDR'];
// change the following nn.nnn.nnn.nnn to IP addresses
// and add others to the array as necessary
$cloaked_ips = array("nn.nnn.nnn.nnn","nnn.nnn.nnn.nn");
if(in_array("$ip",$cloaked_ips)) {
    echo <<<EOF
<html>
<head>
<title>Your Main Keyword</title>
</head>
<body>
<h1>Name of Site or Some Keywords</h1>
<p>Please visit the site directly:
<a href="http://example.com/">name of site or keywords</a>.</p>
</body>
</html>
EOF;
    exit;
}
// everything above is for the proxy,
// and everything below this code is for direct visitors
?>

If the above code works when you view your site in the browser, then you know that the proxy is using the same IP address that is showing in the Show IP Firefox Extension.

If it is not showing the alternate content to the proxy then you can test it by removing the last number from the IP address in the script. If that doesn’t work, I recommend grepping (searching) your log files for IP addresses on the same C-block.

IP Hunting With Grep

Grep is a Unix command that you can use to search large files (such as logs) for information. If you haven’t used grep before, I recommend a grep tutorial or a full command line tutorial like this, this, or this.

To find IP addresses on the same C-block use a line similar to this in a Linux, MacOSX, or Cygwin (Windows) terminal:

egrep "nnn\.nnn\.nnn\." access_log > ip_addresses.txt

That will give you a file named ip_addresses.txt that contains a list of all the hits on your server from the same C-block as the proxy. You will probably find the correct IP to block in that file.

Tracking Proxy Indexing

This is just an idea that I haven’t fully tested yet, but you may be able to keep track of duplicate content on proxies with Google Alerts.

Setup a comprehensive search for something like intitle:”site name”, or even better, intitle:”site name” inurl:”cgi”. Here is an example screenshot:

Google Alerts for Proxy Content Hijacking

If you have additional tips for combating proxy duplication, please leave a comment below.

Close
E-mail It