Proxy hijacking is a problem that has been in the SEO news lately. Dan Thies has a good post about it on SEO Fast Start.
Basically, proxy hijacking occurs when search engines index your Web pages through someone else’s proxy. Here is an example from a proxy site that has indexed a duplicate of wunderground.com’s about page:

Dan Thies mentioned a few solutions to the problem. One is
…a PHP script that implements the “reverse cloaking” defense, putting a “nonindex, nofollow” robots meta tag into your pages unless it’s a spider that you have configured the script to recognize.
That solution may remove your proxied page from Google index (unless the proxy strips your <head> element), but it doesn’t provide any benefit for your site.
Another alternative is to turn that indexed proxy page from a dangerous duplicate of your Web page into an inbound link.
Getting the IBL
To find the IP address of the proxy, use the Show IP Firefox Extension. It will show the IP address(es) of the Web page that is in your browser.

The IP of the proxy is usually the same as the IP that grabs your content. If not, it’s probably on the same C-block of IP addresses, and you can grep it out of your logs (see below).
The first example assumes that the proxy is grabbing your content from the same IP address that is seen with the Show IP Firefox Extension. This PHP code has not been fully tested, but is just intended to be an example of the technique.
<?php $ip = $_SERVER['REMOTE_ADDR']; // change the following nn.nnn.nnn.nnn to IP addresses // and add others to the array as necessary $cloaked_ips = array("nn.nnn.nnn.nnn","nnn.nnn.nnn.nn"); if(in_array("$ip",$cloaked_ips)) { echo <<<EOF <html> <head> <title>Your Main Keyword</title> </head> <body> <h1>Name of Site or Some Keywords</h1> <p>Please visit the site directly: <a href="http://example.com/">name of site or keywords</a>.</p> </body> </html> EOF; exit; } // everything above is for the proxy, // and everything below this code is for direct visitors ?>
If the above code works when you view your site in the browser, then you know that the proxy is using the same IP address that is showing in the Show IP Firefox Extension.
If it is not showing the alternate content to the proxy then you can test it by removing the last number from the IP address in the script. If that doesn’t work, I recommend grepping (searching) your log files for IP addresses on the same C-block.
IP Hunting With Grep
Grep is a Unix command that you can use to search large files (such as logs) for information. If you haven’t used grep before, I recommend a grep tutorial or a full command line tutorial like this, this, or this.
To find IP addresses on the same C-block use a line similar to this in a Linux, MacOSX, or Cygwin (Windows) terminal:
egrep "nnn\.nnn\.nnn\." access_log > ip_addresses.txt
That will give you a file named ip_addresses.txt that contains a list of all the hits on your server from the same C-block as the proxy. You will probably find the correct IP to block in that file.
Tracking Proxy Indexing
This is just an idea that I haven’t fully tested yet, but you may be able to keep track of duplicate content on proxies with Google Alerts.
Setup a comprehensive search for something like intitle:”site name”, or even better, intitle:”site name” inurl:”cgi”. Here is an example screenshot:

If you have additional tips for combating proxy duplication, please leave a comment below.
