Blocking Bad Bots and IP Addresses on Apache Hosting Server | Scam Alert Network
| Together we can fight Crime, Scams and Spam
Thursday February 23rd 2012
Multilayer Website Security Solution

Hot News Flash

Along with rogue anti-malware, fake anti-piracy utilities are now also being distributed. Many Internet users download music and other media from the net, while being infected by malware from hacked or rogue websites. These Trojans issue fake copyright warning messages to scare the public - They are advised to take their chances in court, or skip the heavy fines and possible jail time by opting for a ‘pre-trial settlement’. They are then directed via hacked websites to another malware site where computers are further infected and where fake financial ‘settlements’ are solicited, and where banking details are provided to criminals.

Related Posts

No related posts.

Blocking Bad Bots and IP Addresses on an Apache Hosting Server

The best way to combat spam, malicious file injections and many other undesired criminal activities against a website, is to protect the website with several security layers.  Any attempts at protecting websites are however just as strong as the weakest spot in any of the defense layers.  The following are probably the very least that should be done by webmasters in charge of the most basic websites:

  1. Ensure that website scripts or applications do not have any weak security spots,
  2. Keep bad bots and IPs with a history of bad events away from the website,
  3. Use an anti-virus application to check the hosting space regularly, and
  4. Backup the site frequently.

More advanced options like installing a firewall, using secure socket layers (SSL), caching a website, using automated site spawning options, and hosting on cloud servers are just a few of the more advanced options that may be considered for high traffic and high value websites.  However in this article we are only focussing on keeping bad bots and IPs away from basic websites by using manual processes and the .htaccess file in the public root folder of the popular Apache webhosting server.

ACCESS TO WEB PAGES

Web-browsers like Internet Explorer, Firefox, Safari and other web-browsing agents uses Hypertext Markup Language (HTML) to access HTM or HTML files on websites.  The HTML protocol allows text to be formatted, effects like shadows to be created and media to be inserted in web pages.  HTML files are however inherently static web-pages, therefore hosting servers use various applications (PHP for example) to generate HTML pages dynamically.  To control access to static or dynamic HTML files by web-browsing agents, Apache servers use a file called .htaccess (hypertext access).

Note
:  On a Microsoft server, applications like Active Server Pages (ASP) may be used to create dynamic HTML pages.

USING A HONEYPOT

Many webmasters use a honeypot to attract bad bots, which is basically an automated web-browsing agent, away from their main content.  The aim of having a honeypot is to attract bad bots to useless bait where they are kept busy while their actions are analysed, thus keeping most of them away from actual content.  It is almost like having a flytrap outside a kitchen.

There are several honeypot projects; some use fully functional open proxies to monitor all activities passing through the server (spying on proxy users), while other are focussed at websites functioning almost like a “flytrap”.  Webmasters and Postmasters who are serious to curb abusive behaviour from their own domains, can join Project Honeypot in order to monitor bad events from their own IP addresses, and to obtain a free honeypot for their own sites.  Installing a honeypot is a relative simple process and the data collected can be very handy to improve defences against the most urgent threats.

SENDING BAD BOTS TO YOUR HONEYPOT

After installing the honeypot, the following code can be included in the .htaccess file located in the public directory of an Apache server (Always create a copy or backup of a file before changing anything):


# Redirecting offline browsers and ‘bad bots’ to a honeypot
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^AbachoBOT [OR]
RewriteCond %{HTTP_USER_AGENT} ^anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^antibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^appie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^B2w [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baidu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black\ Hole [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^BuiltBotTough [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye [OR]
RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]
RewriteCond %{HTTP_USER_AGENT} ^BunnySlippers [OR]
RewriteCond %{HTTP_USER_AGENT} ^CheeseBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^ClariaBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^clsHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^COAST\ WebMaster [OR]
RewriteCond %{HTTP_USER_AGENT} ^ColdFusion [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^CopyRightCheck [OR]
RewriteCond %{HTTP_USER_AGENT} ^cosmos [OR]
RewriteCond %{HTTP_USER_AGENT} ^crawl [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^curl [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^Diamond [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RewriteCond %{HTTP_USER_AGENT} ^dloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^DTS\ Agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EroCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Extreme\ Picture\ Finder [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST\ WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Fetch\ API\ Request [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlickBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^FreeFind.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Generic [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gulliver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Harvest [OR]
RewriteCond %{HTTP_USER_AGENT} ^Heretrix [OR]
RewriteCond %{HTTP_USER_AGENT} ^HitboxDoctor [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTPapp [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpfetcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^httplib [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpscraper [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTPTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTPviewer [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^humanlinks [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoNaviRobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^IRLbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [OR]
RewriteCond %{HTTP_USER_AGENT} ^JennyBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JoBo [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Jonzilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kenjin\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Keyword\ Density [OR]
RewriteCond %{HTTP_USER_AGENT} ^Lachesis [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Libby_ [OR]
RewriteCond %{HTTP_USER_AGENT} ^libWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^libwwwperl [OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkextractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkScan [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp\ request [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mata\ Hari [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mercator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Metacarta [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mewsoft\ Search\ Engine [OR]
RewriteCond %{HTTP_USER_AGENT} ^MFC_Tear_Sample [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [OR]
RewriteCond %{HTTP_USER_AGENT} ^MicrosoftURL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^moget [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^NationalDirectory\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Probe [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetResearchServer [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^nexuscache [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nikto [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^oBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^onestop [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind\ data\ gatherer [OR]
RewriteCond %{HTTP_USER_AGENT} ^OrangeBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^our\ agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Perl [OR]
RewriteCond %{HTTP_USER_AGENT} ^PHP [OR]
RewriteCond %{HTTP_USER_AGENT} ^PHP\ version [OR]
RewriteCond %{HTTP_USER_AGENT} ^PHPot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^PingALink\ Monitoring\ Services [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pompos [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProPowerBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psycheclone [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python\ urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^QueryN [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Rico [OR]
RewriteCond %{HTTP_USER_AGENT} ^RMA [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^ScoutAbout [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^slysearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snapbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snoopy [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpankBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^spanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Spinne [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sqworm [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stealer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^suzuran [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^The\ Intraformant [OR]
RewriteCond %{HTTP_USER_AGENT} ^TheNomad [OR]
RewriteCond %{HTTP_USER_AGENT} ^TightTwatBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl [OR]
RewriteCond %{HTTP_USER_AGENT} ^True_Robot [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLy\ Warning [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vagabondo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vayala [OR]
RewriteCond %{HTTP_USER_AGENT} ^VCI [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vintage [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^W3C_Validator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webdownloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEnhancer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webhook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webminer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webmirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webmole [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Websites [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Websucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebViewer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wells [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wildsoft\ Surfer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WinHttp [OR]
RewriteCond %{HTTP_USER_AGENT} ^WinHttpRequest [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Collector-E [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xara [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Y!TunnelPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^YahooYSMcm [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zade [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^zerxbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^(.*)$ /public_html/honeypotdirectory/honeypot.php

Replace “honeypotdirectory/honeypot.php” in the last line with the actual directory path and honeypot script name.  Obviously neither should include the word “honeypot” as it would be like marking a speed trap with a road sign and red ribbons to scare bad bots back towards the actual website content.  Rather use stealth and something like “email”, “subscribers” or “contacts”.

The list of bad bots above is just the tip of the iceberg, as new bots arrive on the Internet regularly and the more advanced bots may even change their own user agent names randomly.  Therefore it is advisable to monitor site logs and Awstats regularly for new bots to include.  Avoid including well behaved Search Engine bots, also known as crawlers or spiders, like the Google bot, Ask.com, MSN and others if the website should be listed in Search Engines.

BLOCKING NAUGHTY IPs

The .htaccess file located in the public directory of an Apache server can also be used to block undesired IP addresses like in the examples below; however the webmaster’s cPanel may also include the “IP Deny Manager” that allow easy blocking of single IPs and IP Address Ranges.

deny from 95.211.21.91
deny from 94.0.0.0/8
deny from 159.226.0.0/16
deny from 202.111.175.0/24
deny from 218.7.0.0/16

The IP Deny Manager and other function icons in cPanel of an Apache server are actually just shortcuts to user friendly interfaces that adjust script files controlling the Apache server; including the .htaccess script file.  In the first example above, a single IP address is blocked while the other examples show blocking a range of IP addresses.  Also note that the .htaccess file on an Apache server is just a plain text file that can be edited with any text editor like Notepad, Wordpad, TextPad or the code editing tools provided in the cPanel File Manager.  The file name must however be accurate and should by default include the dot (.) before the file name to function correctly.  The default filename for .htaccess may however be changed in rare server configurations.  It is also advisable NOT to replace any .htaccess file, but rather to edit it by inserting the additional code, preferably at the bottom if unsure.

Denying access to IP addresses that display a history of bad events, can contribute to curb similar events at a website.  HT Access files can also be used to block all public access to a specific sub-directory by placing a .htaccess file in the sub-directory with the following line in the file:

deny from all

That directory will then become unavailable to web-browsing agents from all IP addresses and can only be accessed by the Apache server or other supported protocols like File Transfer Protocol (FTP), and obviously the webmaster’s cPanel, which is just a server user interface.  If placed in the main public folder, nobody will be able to access the website using web-browsing agents, which may be handy during website security adjustments.  It is however advisable to rather redirect visitors to a notification page during any maintenance.

DISADVANTAGES

The biggest disadvantage with blocking IP addresses semi-permanently as in the above examples; is that Internet is in constant flux and never static.  Therefore constant adjustments may be required.  Furthermore Internet Access Providers share their allocated IP address ranges amongst all their customers.  The IP address allocated to a spammer may be allocated to someone else a few minutes later.  Therefore it is often best to block an entire IP range with the knowledge that nobody in that IP range will be able to access a website.

Manual processes as described above, is labour intensive with limited affectivity; however manual methods can still be worthwhile to block some of the most annoying Spammers.  However for some websites, it may be worthwhile to automate the above processes or to use a Firewall.

AUTOMATION

Many free or open source scripts are available to protect specific web applications against spam.  Some scripts are specifically designed to curb email spam, other to avoid spam at discussion forums or to protect blogs against spam comments; to mention a few.  Although some of these scripts do an excellent job according to their design objectives, there are a few basic drawbacks:

  • Blocking spam emails are mostly addressing the symptom and not the cause.  Preventing the harvesting of email addresses from websites should be the first step.
  • Most of the free scripts only provide protection against spam at the web application level, and not for the entire website.  When compared to a computer, it would be like trying to protect each application separately with a purpose made anti-virus and mini firewall.
  • Furthermore being open source, it is easy for spammers and crackers to search the code for weak spots or loopholes to circumvent the protection offered by it.  Unless an open source script is exceptionally well developed, it would probably not provide adequate protection for an entire website against bots, harvesters, scrapers, remote file injections, attacks and other threats.

Webmasters have the option to try protecting websites manually and to use a few additional scripts for added protection, or else they can opt for commercial firewall solutions and add security layers as desired; however such solutions may be rather expensive for entry-level websites.

SpamTrawler
is a recent commercial firewall market entry, aimed specifically at entry-level to intermediate websites.  Unlike many other security solutions that has monthly maintenance fees, it can be licensed for an indefinite period with a single payment; however future upgrades will be payable if required.  SpamTrawler also make provision for incorporating other scripts and free tools like the Bad Behavior plug-in for WordPress blogs and ClamAV to scan hosting space for spam and malicious script injections.  Furthermore it can be set to check website visitors against other DNSBL and use a combination of a local database and cookies to counter attacks and script injections.  SpamTrawler functions as a platform to bring various independently developed anti-spam tools together as a suite of applications along with its own firewall to combat various threats at the firewall before granting entry to a web-space.  It also monitors changes behind the firewall and can be configured with various traps for undesired content and behaviour.

Bad Behavior complements other link spam solutions by acting as a gatekeeper, preventing spammers from ever delivering their junk, and in many cases, from ever accessing site content in the first place. This keeps the server’s load down, result in cleaner site logs, and can help prevent denial of service conditions caused by spammers.

Clam AntiVirus
is an open source (GPL) anti-virus toolkit for UNIX servers, designed especially for e-mail scanning on mail gateways, although it is also effective to detect malicious scripts and other threats.  It provides a number of utilities including a flexible and scalable multi-threaded daemon, a command line scanner and advanced tool for automatic database updates.  The core of the package is an anti-virus engine available in a form of shared library.

FINAL REMARKS

Cyber criminals require websites or “home bases” where they can earn income or coordinate their operations.  Without these spamvertised websites, there are actually very little incentives to engage in spam.  Therefore botnets, spam and even computer viruses are actually symptoms of malicious advertising practices by corrupt businesses and criminals, who inevitably depend on their websites to remain financially viable.  So why do webhosting companies tolerate and accommodate criminals and what can be done about it?

Some webhosting companies pretend to be ignorant of the criminal activities operated from their facilities, perhaps due to financial incentives and kickbacks.  Perhaps the real battle against cybercrime is fought on the wrong front, and should actually be directed at webhosts who “ignorantly” continue to facilitate crime.
View Spam Alert Network in a larger map

VN:F [1.9.11_1134]
Rating: 3.5/10 (2 votes cast)
VN:F [1.9.11_1134]
Rating: +1 (from 1 vote)
Blocking Bad Bots and IP Addresses on an Apache Hosting Server, 3.5 out of 10 based on 2 ratings
Share this with Friends:
  • del.icio.us
  • Google Bookmarks
  • Blogosphere News
  • FriendFeed
  • Internetmedia
  • laaik.it
  • LinkedIn
  • Linkter
  • Live
  • MySpace
  • Ping.fm
  • Propeller
  • Reddit
  • RSS
  • Socialogs
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz
  • Yahoo! Bookmarks
  • Netvibes
  • Tumblr
  • BlinkList
  • Add to favorites
  • blogmarks
  • Blogplay
  • Current
  • Digg
  • Diigo
  • DotNetKicks
  • DZone
  • eKudos
  • Facebook
  • Fark
  • Faves
  • Fleck
  • FSDaily
  • Global Grind
  • Gwar
  • HackerNews
  • HelloTxt
  • Hyves
  • LinkaGoGo
  • LinkArena
  • Meneame
  • MisterWong
  • MSN Reporter
  • MyShare
  • Netvouz
  • NewsVine
  • PDF
  • Segnalo
  • SheToldMe
  • Simpy
  • Slashdot
  • SphereIt
  • Sphinn
  • Tipd
  • Twitter
  • Upnews
  • Webnews.de
  • Webride
  • Wikio
  • Wykop
  • Xerpi
  • Yigg
  • Suggest to Techmeme via Twitter

No related posts.

Leave a Reply

You must be logged in to post a comment.